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Preface 


GATE-Graduate Aptitude Test in Engineering offers a good 
opportunity to the students who are willing to seek 
admission to postgraduate courses and want to avail 
government scholarship for a further academic carrier in 
Engineering. In this era of competition, the engineering have 
gone to peak position. The engineers have a different 
innovation to do something new. So, there is lot of craze, an 
aspirant have to crack GATE and PSUs through many 
competition exams. 

An aspirant would like to learn lot of things in short duration 
but in less volume with the help of single book, this quality 
which our handbook consists. This handbook is meant for an 
exhaustive and precise collection of all subjects that come 
under Computer Science and IT. 


They key features of this handbook are 


e Each topic is summarized in an exhaustive manner in the 
form of key points and notes. 


e Every topic is taken up separately along with key points 
and notes. 


e Focused material in entirety to prevent ambiguity in 
concepts. 


| am thankful to Arihant Publications (India) Limited for 
giving me this opportunity to write such a book which covers 
almost 100% syllabus of GATE and PSUs and thus 
enlightens the path to your success. 

| would like to thank Er. Akash Shukla (Project Coordinator) 
for giving me full support during this project. My good 
wishes to all the readers. 


Surabhi Mishra 
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Theory of 
Computation 


Basics of Theory of Computation 


Computation is defined as any type of calculation. It is also defined as use 
of computer technology information processing. The theory of computation 
is the branch that deals with whether and how efficiently problems can be 
solved on a model of computation, using an algorithm. 


Basic Definitions 

Symbol A symbol is an abstract entity i.e., letters and digits. 

String A string is a finite sequence of symbols. 

Alphabet An alphabet is a finite set of symbols, usually denoted by È. 


Language A formal language is a set of strings of symbols from some 
alphabet. 


Key Points ~~~ 


+ Generally a,b,c, ... used to denote symbols. Alphabets will represent the 
observable events of an automata. 

+ Generally w, x,y,z used to denote words. A word will represent the 
behaviour of an automation. 
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Kleene Closure 


If È is the set of alphabets, then there is a language in which any string of 
letters from = is a word, even the null string. We call this language closure 
of the alphabet. It is denoted by * (asterisk) after the name of the alphabet 
is =. This notation is also known as the Kleene Star. 


Gi; lf £= {a}, then 
DY ={a,a, aa, aaa, ...} 
where, A represents null string. 
If x = {a, b}, then 
D = {^ a, b, aa, ab, bb, ...} 


-Key Points <r 


+ By using Kleene Star operation, we can make an infinite language of strings of 
letters out of an alphabet. 


+ The words in increasing order of length called lexicographic order. 


Positive Closure 
The ‘+’ (plus operation) is sometimes called positive closure. A + (Plus) 
closure never contain null value. 


I x = {a}, then =* = {a, aa, aaa, ... } 
Note If S is a set of strings, then S* is the language S* without the word a. 


Operations Over Words in = * 
e Concatenation If x, y € =’, then x concatenated with y is the word formed 
by the symbols of x followed by the symbols of y. This is denoted by xy. 


e Substring A string v is a substring of a string wif and only if there are 
strings x and y such that w= xvy. 

e Suffix If œ= xv for some string x, then v is suffix of œ. 

e Prefix If œ= vy for some string y, then v is a prefix of œ. 

e Reversal Given a string œ, its reversal denoted by œf is the string spelled 
backwards. 


e.g., (abcd)? =dcba 
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Alphabet >" 


>’ It is the set of all words for a given alphabet £. This can be described inductively in 
atleast two different ways 


Inductive step If a € £ and x ed’, then ax € X7 


1 
1 
i) 
1 
1 
i 
i) 
= Basic case The empty word a is in © (notation: A € È ) ! 
1 
andalso  xaeX 

1 

1 

1 


= Null set The language that has no words and can be represented by 9. 


Finite Automaton 


A Finite State Machine (FSM) or finite state automaton is an abstract 
machine used in the study of computation and language that has only a 
finite, constant amount of memory. 


Finite 
automaton 
Input faving n Output 
values values 


states 


(44,42-----Gn) 





Model of finite automaton 


Description of Finite Automaton 
This finite automaton is also known as Deterministic Finite Automaton 
(DFA). But when the transition function maps Q x 2°, (2° is called 
powerset function, is used then that automaton is known as 
Non-deterministic Finite Automaton (NDFA or NFA). 
A finite automaton is defined as 5-tuples (Q, X, 5, qo, F) 
where, Q is finite non-empty set of states. 

x is finite non-empty set of inputs called alphabets. 

5 is transition function which maps Q x LintoQ. 

qo is initial state and go € Q. 

F is set of final states and F c Q. 
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Transition Diagrams 


A transition diagram is a finite directed 
labelled graph in which each vertex 
represents a state and directed edges 
indicate the transition from one state to 
another. Edges are labelled with 
input/output. In this, the initial state is 
represented by a circle with an arrow towards it, the final state is 
represented by two concentric circles and intermediate states are 
represented by just a circle. 

e.g., In the given transition diagram, vertex A is the initial state of the finite 
automata and vertices A and B are the final states and all the edges of this 
transition diagram are labelled with the inputs. 


0,1 


ORKO 


Transition diagram 


Transition Table 


Transition table is the tabular representation of transition system. In this 
representation, initial (start) state is represented by an arrow towards it and 
a final state is represented by a circle. 


Transition Table 




















Stat Inputs 
ates 
a b 
qo q2 qı 
qı q3 qo 
Transition diagram q2 qo q3 


Transition table 
In this transition table, we have mentioned only three states gy, q4 and 
2,43 iS not mentioned because q, state is not reachable to any other 
node. 


-Key POINGS eaiussessssinesissnininisisnssis si niss 


* Transition function is usually represented by a transition table or a transition 
diagram. 

+ Finite automata are finite collections of states with transition rules that take 
input to move from one state to another. 

+ The machine starts in the start state (initial state) and reads in a string of 
symbols from its alphabet. It uses the transition function 5 to determine the 
next state using the current state and the symbol just read. If, when it has 
finished reading, it is in an accepting state, it is said to accept the string, 
otherwise it is said to reject the string. 
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Acceptability of a String by Finite Automata 


A string œ is accepted by a finite automaton, M = (Q, X, qo, F), if 
ô (qo, ©) = F, where qq Is initial (start) state and F is the final state. 


Note FA (Finite Automata) must accepts those strings which are in the given 
language but should not accept those strings which are not in the 
language of finite automata. 


Transition Function It takes two arguments i.e., a state and an input 
symbol. 6 (g, a) is the transition function for the DFA (Deterministic Finite 
Automata) which is at the state g and when it receives the input a, DFA will 
move to the next state. 


Extended Transition Function It takes two arguments a state and an 
input string. 
5 (q, œ) = 6 (è (q, x), a) 


where, @ is a string i.e., œ= xa, in which a is a single word and x is 
remaining string except the last symbol. 


-Key POINGS eesnimi iesasnisissnininissnissis si ns 


+ The finite automata has only string pattern recognising power. 


+ Generally, all FA having only one reading head on the input tape, but a FA 
having more than one reading head has equal power as FA with one reading 
head. 


+ A language L € E * is regular if and only if there is a FA that recognises L. 
Language of a DFA 
An automata of all kinds defined languages. If A is an automaton, L (A) is 
its languages. 


For a DFA A, L(A) is the set of strings labelling paths from the start state to 

a final state. Formally, L(A) is the set of strings œ such that 6 (qg, œ) is in F 

(final state or set of final states). 

e.g., String 101 is in the language of the DFA below. Start at A. 
0 


Follow arc labelled 1, then arc labelled 


O from current state B. Finally, arc HQ 1 1 
labelled 1 from current state A. Result Start OO 
bem 


0,1 


Q 


in an accepting state, so 101 is in the 


language. ET f 
Transition diagram of a string 101 
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The language of our example DFA is 
L={@|o isin {0, 1}* and wdoes not have two consecutive 1's} 


e Read a set former as ‘the set of string œ, such that these conditions about 
ware true 


Equivalence of DFA and NDFA 

In constrast to the NFA (NDFA), the Deterministic Finite Automata (DFA) has 
e No a-transition (null transition). 

e For every (q, a) with q € Q and a e È at most one successor state. 


e A deterministic finite automata can simulate the behaviour of NFA by 
increasing the number of states. 


-Key POINGS assins iiessssininisssininsisssi nss 


+ In Deterministic Finite Automata (DFA), for each state, there is at most one 
transition for each possible input. 

+ In non-deterministic finite automata, there can be more than one transition 
from a given state for a given possible input. 

+ Ifa language Lis accepted by an NFA, then there is a DFA that accepts L. 

+ All DFA are NDFA, but not all NFA are DFA. 

+ All NDFA and DFA have the same power. 

+ Processing of an input string is more time consuming when NFA is used and 
less time consuming when DFA is used. 


Finite Automata having ,~-moves 


The model NFA include those transitions 0 1 0 
by which, without giving any input FA can () QO 


Q 
move to the next state. (a) $ (02) ^ +) 


A-moves finite automata 





Finite Automata with 
Output Capabilities 
The finite automata was explained in above sections have binary output 
i.e., either they accept the string or they do not accept the string. 

Under finite automata, there are two other machines 

Moore Machine 


It is a finite automata in which output is associated with each state. Each 
state of Moore machine has a fix output. 
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Moore machine is defined as 6-tuples (Q, 2, A, 5, à, go), 
where, Q is finite non-empty set of states. 
= is set of input symbols. 
A is output alphabet. 
ò is transition function which maps 6 (=x Q) >Q. 
A is output function which maps Q into A. 
qo İS initial state. 


Mealy Machine 
It is a finite automata in which the output depends upon the present input 
and present state, both. 


Mealy machine is also a 6-tuple (Q, £, A, 8, à, qo), where all symbols except 
à have the same meaning as in Moore machine. A is the output function 
mapping È x Q into A. 


-Key POINTS iiini 
+ We can construct equivalent Mealy machine for a Moore machine and 
vice-versa. 


+ If M, and M, are equivalent Moore and Mealy machines, respectively then 
two outputs 7, (œ) and 7,(œ) are produced by the Moore and Mealy machines 
M, and M, respectively, for input string œ. Then, the length of T,(@) is one 


greater than the length of T, (œ) i.e., 
[mo] =|T,@)| +1 


Grammar 


In 1956, Noam Chomsky gave a model of grammar. A grammar is defined 
as 4-tuples (Vy, È, P, S), 
where, Vy is finite non-empty set of non-terminals. 
= is finite non-empty set of input terminals. 
P is finite set of production rules. 
S is the start symbol. 


Chomsky Hierarchy 


The Chomsky hierarchy is a containment hierarchy of classes of formal 
grammars that generate formal languages. The formal grammars consist 
of a finite set of terminal symbols, a finite set of non-terminal symbols, a set 
of production rules with a left and a right hand side consisting of a word of 
these symbols and a start symbol. 
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The Chomsky hierarchy consists of following four levels 
Type 0 Grammar (Unrestricted Grammar) 


These are unrestricted grammars which include all formal grammars. 
These grammars generate exactly all languages that can be recognized 
by a Turing machine. 


Rules are of the form a —> 8, 
where, œ and are arbitrary strings over a vocabulary V anda + a (null). 
Type 1 Grammar (Context Sensitive Grammar) 


Languages defined by type-1 grammars are accepted by linear 
bounded automata. 


Rules are of the form a AB >a BB 
where, AéeVy 
a,B,Be (Vy UZ) 
BEA 
Type 2 Grammar (Context-free Grammar) 


Languages defined by type-2 grammars are accepted by push-down 
automata. 


Rules are of the form A>, 
where, AéeVy 
ae (Vy UX)* 
Type 3 Grammar (Regular Grammar) 


Languages defined by type-3 grammars are accepted by finite state 
automata. 


Rules are of the form A>A 


A>a 
A>aB 


where, A,Be Vy anda € È. 


Comparison between Different Types of Classes 








Class Grammars Languages Automaton 
Type-0 Unrestricted Recursively enumerable Turing machine 
(Turing recognizable) 
Type-1 Context sensitive | Context sensitive Linear bounded 
Type-2 Context-free Context-free Push down 


Type-3 Regular Regular Finite 
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Regular Expression 

Regular expressions mean to represent certain sets of strings in some 

algebraic fashion. 

A regular expression over the alphabet E is defined as follows 

e gis aregular expression corresponding to the empty language 9. 

e ais aregular expression corresponding to the language {4}. 

e For each symbol ae F, a is a regular expression corresponding to the 
language {a}. 


Regular Language 


The languages accepted by FA are regular languages and these 
languages are easily described by simple expressions called regular 
expressions. 


For any regular expression r and s over E, corresponding to the languages 
L, and L, respectively, each of the following is a regular expression 
corresponding to the language indicated. 


e (rs) corresponding to the language L,L.. 
e (r+) corresponding to the lanugage L, U L 
e r corresponding to the language i 


Some examples of regular expression are 
(i) L (01) = {0, 1} (ii) L (01+ 0) = {01, 0} 
(iii) L (0 (1+ 0)) = {01, 00} (iv) L (0') = {e, 0, 00, 000, ... } 
(v)L((0+10) (A+1))=all strings of O's and 1’s without two 
consecutive 1’s. 


-Key POINTS iiini 

+ If L, and L, are regular languages in F, then L4 U L, L40 b, Lı -L and L4 
(complement of L4), are all regular languages. 

+ Pumping lemma is a useful tool to prove that a certain language is not regular. 


+ The Myhill-Nerode theorem tells that the given language might be regular or 
not regular. 
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Regular Set 
A set represented by a regular expression is called regular set. 
e.g., If È = {a, b} is an alphabet, then 


Different Regular Sets and their Expressions 





Sets Regular Expression 
Q (o 
{a} a 
{a,b} a+b 
{€, a, aa, aaa, ...} a 
{a, aa, aaa, ...} aa =a" 








Identities for Regular Expressions 


The following points are the some identities for regular expressions. 





= O+R=R+O0=R = OR=Rb=0 
a AR=RA=R = A =aando =€ 
(RY =R RR = R 





= (P+Q) =(PQ’) =(P* + Q*)*, where P and Q are regular expressions. 
= R (P +Q) = RP + RQ and (P + Q) R = PR + QR 
= P (QP) = (PQP 





i 
1 
i 
' = R+ R =R, where R is the regular expression. = RR’ =R'R=R* 
1 
1 
i 
I 
l 
1 


Regular Expression and Finite Automata 


NFA with ~-moves Let r, and r be the two regular expressions over £4, £» 
and E, and £, are two NFA for 4 and fy, respectively. 


Regular Expression to »-NFA (Union) 


For E UE, 
| | 
© O 
For E, U E2 


State diagram of NFA 
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Regular Expression to »-NFA (Concatenation) 
For E, E> 





DOE) 


State diagram of NFA 


Regular Expression to »-NFA (Closure) 
For E 





State diagram of NFA 


Construction of Regular Expression from DFA 
If a DFA recognises the regular language L, then there exists regular 
expression, which describes L. 


i 
Arden’s Theorem 

= If P and Q are two expressions over an alphabet £ such that P does not contain ^, | 
then the following equation R = Q + RP. 

= The above equation has a unique solution i.e., R = QP’. Arden’s theorem is used to ! 
determine the regular expression represented by a transition diagram. 
The following points are assumed regarding transition diagrams 
= The transition system does not have any ~-move. ' 
= It has only one initial (starting) state. 


Properties of Regular Language 


Regular languges are closed under following properties 


(i) Union (ii) Concatenation 
(iii) Kleene closure (iv) Complementation 
(v) Transpose (vi) Intersection 


e If Ais polynomial time reducible to B (i.e., A <, B) and Bis in P, then Ais 
also in P. 


la 


Theory of Computation 


e |f Ais NP-complete problem, then it is a member of P if and only if P = NP. 
e The complexity class NP-complete is the intersection of NP and NP-hard 


classes. 


-Key Points 


+ All NP problems can be solved, if an NP problem can be reduced to an 
NP-hard problem before solving it in polynomial time. 
+ If Ais NP-complete and there is a polynomial time reduction of A to B, then B 


is NP-complete. 


+ If there is a language L such that L € P, then complement of L is also in P. 

+ P complexity class is closed under union, intersection, complementation, 
Kleene closure and concatenation. 

+ NP complexity class is closed under union, intersection, concatenation and 


Kleene closure. 


Design of Automaton 


The design can be illustrated by under following three categories 


Sequence of string is fixed 


If the sequence of string is fixed that suppose an automaton is designed to 


accept a string with abbacb. 





KOMONO ORORO) 


The value of string is changed by changing input value 
Suppose, an automaton is designed to accept the string abba or bbba. 


a 
-X _ AOAO 
b 


Transition Table 








State pur 
a b 
qı q2 2 
q2 = 93 
93 — q4 
44 45 = 
45 = = 
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When input values are repeated 


Suppose an automaton is to be designed for accepting all string generated 
by input a. That means it has to accept a, aa, aaa ... 


The Transition Table 


This will be designed as 











Input 
State p 
a 
qı q2 
q2 q2 





Now, if we have to design a machine which accepts all inputs containing 
string ab 
a b 
q q 
OOE O) 


Context-Free Language 





In formal language theory, a context-free language is a language generated 
by some context-free grammar. The set of all context-free languages is 
identical to the set of languages accepted by pust-down automata. 


Finite automata accept all regular languages. Many simple languages are 
non-regular and there is no finite automata that accepts them. 
e.g., The non-regular languages are like 
{a"b" :n =0,1,2,...} 

{wwf : we {a, b}*} 
Context-free languages are a larger class of languages that encompasses 
all regular languages and many others, including the two above. But 
reverse is not true j.e., every context-free language is not necessarily 
regular. 
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Context-Free Grammar (CFG) 
A context-free grammar is defined as 4-tuples (2, Vy, P, S),. 
where, È is an alphabet (each character is È is called terminal). 
Vy is a set of non-terminals. 
P is the set of production rules, where every production has the form 
Aa; where, AEVy,a € (Vy UX) * 
e.g., Here is a formal CFG for {0" 1” :n 2 1} 
Terminals = {0, 1} 
Non-terminales = {S} 
Start symbol = S 
Productions =S +01; S => 0S1 


Note The languages which are generated by context-free grammars are called 
Context-Free Language (CFL) 


Derivations 


A derivation of a string is a squence of rule applications. The language 
defined by a context-free grammar is the set of strings derivable from the 
start symbol S (for sentence). 


Definition 
If v is one-step derivable from u, written u =v. If v is derivable from u, 
written u if there is a chain of one derivations of the form. 





U Uy Up >... SV 











e.g., Consider the context free grammar S 
G = ({s}, {0, 1}, P,S) E $ 
Productions (i) S > OS 1or just only S > 0S1/a /\\ 
(ii) S SA a AA b 
Derivations are 
(a) S > 0S1 using (i) ave 
=> 01 using (ii) S >€(2) 


(b) S = 0S1 using (i 
= 00811 using (i 
=> 000S 111 using 
=000 111 using 
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Derivation Tree 


A derivation tree (or parse tree) can be defined with any non-terminal as the 
root, internal nodes are non-terminals and leaf nodes are terminals. Every 
derivation corresponds to one derivation tree. 

If a vertex A has k children with labels A,, A», Ag, -. A,, then A > A; A; Ag... 
A, will be a production in context-free grammar G. 





S A 
e.g., 
S — AB i», fs 
A> aAA IN A 7S 
A—>aA a A b a aA 
B—>bB | Yield aaAA 
Bob Yield aAab 


Derivation of a String 
=a Every derivation corresponds to one derivation tree. 
S => AB 
= aAAB 
= aaAB 
= aaaB 
= aaab 
= Every derivation tree corresponds to one or more derivations. 
S=3AB S= ABS => AB 
> aAAB = Ab AB 
> aaAB = aAAb = aAAb 
> aaaB aAab = aaAb 
> aaab = aaab = aaab 
































Left Most Derivation and Right Most Derivation 


A derivation is left most (right most), if at each step in the derivation a 
production is applied to the left most (right most) non-terminal in the 
sentential form. In above three derivations, the first one is left most 
derivation and the second one is right most derivation, the third is neither. 


Ambiguous Grammar 
A context-free grammar G is ambiguous if there is atleast one string in L(G) 


having two or more distinct derivation trees (or equivalently, two or more 
distinct left most derivations or two or more distinct right most derivations). 
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e.g., consider the context-free grammar G having productions 
E—E+E/a.The string a + a + a has two left most derivations. 


Let’s see the derivations 
Es>sE+E>E+E+E»>a+E+E>a+a+E>a+a+a 





E>sE+E>a+E>a+E+E>a+a+E>a +a+a 





and the derivation trees are 




















» 
Q 
» 


Potential Algorithmic Problems for 
Context-Free Grammars 
a Is L(G) empty? = Is L (G) finite? 
= Is L(G) infinite? = IsL(G,) = L (G3)? 
= Is G ambiguous? 


ee et a Sai ea pe ae a a te ain ee tee 


CFG Simplification 

The four main steps will be followed in CFG simplification 
e Eliminate ambiguity. 

e Eliminate useless symbols productions. 

e Eliminate a productions : A > a 

e Eliminate unit productions :A > B 


Eliminate the Ambiguity 
We can remove the ambiguity by removing the left recursing and left 
factoring. 


Left Recursion 
A production of the context free grammar G = (Vy, E, P,S) is said to be left 
recursive if it is of the form 
A > Aa 
Where, A is a non-terminal and 
a € (Vy UE) * 
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Removal of Left Recursion 
Let the variable A has left recursive productions as follows 
(i) A Aa | Aaa] Acts] ...| Ao] By! Bol Bal --.1 Bn 


Where B4, Bo, ..... B, do not begin with A. Then we replace A production 
in the form of 


(ii) A> B,A'|B,A')....|B, A' where 


A'>0.,A'|a5Aa,A)....,]a,A] A 


Left Factoring 
Two or more productions of a variable A of the grammar G = (Vy E, S, P) 
are said to have left factoring, if the productions are of the form 

A > aB; apal... aB, where B,....B, Vy U E) 
Removal of Left Factoring 
Let the variable A has left factoring productions as follows 


A > aß laß] ..-1 aBn | y1 | Y2 | Ya |-| Ym 


where, B4, B2. Ba have a common factor 


PEREA 


O and yy, Yours Ym does not contain œ as a prefix, then we replace the 
production into the form as follows 


A >a A! |Y; Yo|...-..| Yy, where 
A' > Bil Ba |---| Bn 


Eliminate the Useless Productions/Symbols 

The symbols that cannot be used in any productions due to their 
unavailability in the productions or inability in deriving the terminals, are 
known as useless symbols. 


e.g., consider the grammar G with the following production rules 
S—>aS |A |C 
A>a 
B > aa 
C > ab 

Step 1 Generate the list of variables those produce terminal symbols 
U ={A,B,S} 
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Because C does not produce terminal symbols so this production will be 
deleted . Now the modified productions are 


Sa aS |A 
A-a 
B- aa 

Step 2 Identity the variables dependency graph 
S—> AB 


In this graph, B variable is not reachable from S so it will be deleted 
also. Now the productions are S + aS| A 


A->a 


Eliminate Null Productions 
If any variable goes to ^ then that is called as nullable variable. 
e.g., A a, then variable A is said to be nullable variable 
Step 1 Scan the nullable variables in the given production list. 
Step 2 Find all productions which does not include null productions. 
Ê= p (Null productions) 
e.g., consider the CFG has following productions- 
S — ABaC 
A> BC 
B > bla 
C > D\a 
Dod 
solve step find the nulable variables firstly the set is empty 
N = {} 
N = {B, C} 
N ={A, B, C} 
Due to B,C variables, 
A will also be a nullable variable. 
Step 3 Ô = p {Null productions} 
S— BaC|AaC |ABa|aC|Ba|Aala 
A>B|C 
Bob 
C+D 
Dod 
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The above grammar is the every possible combination except ^ 
Now put this new grammar with original grammar with null. 
S — ABaC| BaC | AaC | ABa | aC | Ba | Aa |a 


Eliminate the Unit-Productions 
A production of the type A > B, where A, B are variables is called unit 
productions. 


Step 1 Using productions, we create dependency graph 


A=® 


S=>B 
SB&BSA 


SSA 
Step 2 Now the production without unit productions 
S > Aa S > bb |a|bc 
B= bb + A => bb 
A—a|bc B-a|bc 


Now the final grammar is 
S > Aa|bb|a| bc 
B > bb|a|bc 
A> aļbc | bb 


Normal Forms of CFGs 


Ambiguity is the undesirable property of a context-free grammar that we 
might wish to eliminate. To convert a context-free grammar into normal 
form, we start by trying to eliminate null productions of the form A > ^ and 
the unit productions of the form B >C. 
There are two normal forms 

1. Chomsky Normal Form (CNF) 

2. Greibach Normal Form (GNF) 
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Chomsky Normal Form (CNF) 


A context-free grammar G is said to be in Chomsky Normal Form, if every 
production is of the form either A — a, (exactly a single terminal in the right 
hand side of the production) or A — BC (exactly two variables in the right 
hand side of the production). 


e.g., the context-free grammar G with productions S > AB, A > a, B > bis 
in Chomsky normal form. 





Chomsky Normal Form Properties 
e The number of steps in derivation of any string œ of length n is 2n — 1, 
where the grammar should be in CNF. 


e The minimum height of derivation tree of any wof length nis [ log, n] +1. 
e The maximum height of derivation tree of any wof length n = n. 


Greibach Normal Form (GNF) 
A context-free grammar is said to be in Greibach Normal Form, if every 
production is of the form 

A— aa 


where, ae, AeVy anda eVi 


Decision Algorithms for CFLs 

Theorem There are algorithms to determine following for a CFL L 
Empty Is there any word in L? 

Finite Is L finite? 

Infinite Is L infinite? 

Membership For a particular string œ, is we L? 


Pumping Lemma for CFLs 
The pumping lemma is used to prove that certain languages are not CFL. 


Closure Properties of CFLs 
CFLs are closed under following properties 


e Union e Concatenation 
e Kleene closure e Substitution 
e Homomorphism e Reverse homomorphism 


CFLs are not closed under following properties 
e Intersection e Complementation 
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Deterministic Context-Free Language (DCFL) 


The set of deterministic context-free 


languages is a proper subset of the set of 
context-free languages that possess an 

CFL has subset of DCFL which 
has subset of regular 





unambiguous context-free grammar. 


-Key POINTS iiinn 
+ The problem of whether a given context-free language is deterministic is 
undividable. 
+ Deterministic context-free languages can be recognized by a deterministic 
turning machine in polynomial time and O (log? n) space. 


+ The language of this class have great practical importance in computer 
science as they can be passed much more efficiently then non deterministic 
context-free languages. 


Pushdown Automata (PDA) 


A Pushdown Automata (PDA) is essentially an NFA with a stack. A PDA is 
inherently non-deterministic. To handle a language like {a” b” |n = 0}, the 
machine needs to remember the number of a’s and b’s. To do this, we use 
a stack. So, a PDA is a finite automaton with a stack. A stack is a data 
structure that can contain an number of elements but for which only the top 
element may be accessed. 


Definition of PDA 

A Pushdown Automaton (PDA) is defined as 7-tuple. 
M =(Q,%,T,5,q9,ZF) 

where, Q is a finite set of states 
is the input alphabet 
T is the stack alphabet 
6 is the transition function which maps 
(Q x (ZU {e}) x TU {e}) = (Q x TU {€})) 

Qo is the start state and e denotes the empty string. 
do EQ is start state 
Z eT is the intial stack symbol 
F <Q is the set of final or accepting states. 
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Acceptance of PDA 


Tape is divided into finitely many cells. Each cell contains a symbol in an 
alphabet £. The stack head always scans 

the top symbol of the stack as shown in | 
figure. asia 


, Tape head head 
It performs two basic operations \ rp 


Tape 





























Push add a new symbol at the top. ES aN! 
aCi 
Pop read and remove the top symbol. contio. 
5 (q,a, v) = (pu) 
It means that if the tape head reads input Push and pop of PDA 


a, the stack head read v and the finite 
control is in state q, then one of the possible moves is that the next state is 
p.v is replaced byu at stack and the tape head moves one cell to the right. 
ò (q, ~ v) = (pu) 
It means that this is a a-move (null move) 
ò (q,a, A) = (P.u) 
It means that a push operation performs on stack. 
ô (q.a, v) =(P, ^) 
It means that a pop operation performs on stack. 


-Key Points 3 
+ In computer science, a pushdown automata is a type of automata that 
employs a stack. 
+ The PDA is used in theories about what can be computer by machine. 


+ PDA is more capable than a finite-state machine but less capable than a 
turning machine. 


Non-deterministic PDA 

Like NFA, Non-deterministic PDA (NPDA) has number of choices for its 
inputs. An NPDA accepts an input, if sequence of choices leads to some 
final state or causes PDA to empty its stack. 


Deterministic PDA 

Deterministic PDA (DPDA) is a pushdown automata whose action is an 
situation is fully determined rather than facing a choice between multiple 
alternative actions. DPDAs cannot handle languages or grammars with 
ambiguity. A deterministic context-free language is a language recognised 
by some deterministic pushdown automata. 
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Following languages are DCFLs 
e L={a"b" :n>= 0} e L={a"cb”" :n > 0} 


e L={wco" : we (a + b)* but not L = {00° : (a + b)*} 


-Key Points i 
+ For any context-gree language L these exist an NPDA in such form L = L (m) 
+ IfL=L(m) for some NPDA m then L is a context-free language. 
+ The family of context-free language is closed under union concatination and 
star closure (L*). Means for once input b is arrived there should be state 
change to ensure that. Only input b is acceptable. 


+ To ensure that a is equal to b we will push all a to stack and after reading b 
we will be pop one a with one b. 


Parsing 


Parsing is a technique to construct a parse (derivation) tree or to check 
whether there is some left most derivation or right most derivation. 


Parsing may be classified as 


Top-down Parsing 

In this parsing, start symbol is root, terminals are leaves (inputs symbols) and 

other nodes are variables. We start from the root and replacing the intermediate 

nodes one by one from left to right reach the leaves. This approach is also 

known as recursive descent parsing or predictive descent parsing. 

LL parser is a top-down parser for a subset of the context-free grammars. It 

parses the input form left to right and constructs a left most derivation of 

the sentence. LL parser is called an LL (k) parser, if it constructs a left most 

derivaiton by looking k symbols ahead. 

LL parser consists of 

e An input buffer, holding the input string. 

e A stack on which to store the terminals and non-terminals from the 
grammar yet to be parsed. 

e A parsing table which tells it what grammar rule to apply given the 
symbols on top of its stack and the next input token. 


Bottom-up Parsing 


Bottom-up parser reads the input from left and uses right most derivation in 
reverse order. Another name of bottom-up parser is shift reduce parser. 
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LR Grammar 


For grammar to be LR, it is sufficient that a left to right shift reduce parser 
should able to recognize handles when they appear on top of the stack 
when LR parser is implemented by using a stack. A grammar that can be 
parsed by an LR parser examining upto ‘K’ input symbols (k look ahead 
symbols) on each move is called an LR (k) grammar. 


Points to be Remembered 


= For every regular set, there exists a CFG G such that L = L (G). CFL 


= Every regular language is a CFL. 

= Let G, and G, be context-free grammars. Then, G, and G, 
are equivalent if and only if L (G,) = L (G,). 

= The intersection of a context-free language and a regular language is a context-free 
language. 

= The reverse of a context-free language is context-free. 

= A DFA can remember only a finite amount of information whereas a PDA can 
remember an infinite amount of information. 

= For every PDA, there is a context-free grammar and for every context-free grammar, 
there is a PDA. 

= IfL,is a DCFL and L, is regular then, L4 U L, is also DCFL. 

= IfL,is a DCFL and L, is a regular language, then L4 A L, is also DCFL. 

= Every regular language is DCFL. 

= The power of non-deterministic pushown automata and deterministic pushdown 
automata is not same. But the power of non-deterministic pushdown automata and 
deterministic pushdown is same. 

= A FSM (Finite State Machine) with one stack is more powerful than FSM without stack. 

= If left recursion or left factoring is present, it is not sure that the grammar is 
ambiguous but there may be chance of ambiguity. 


Difference between LL and LR 


There is a significant difference between LL and LR grammars. For a 
grammar to be LR (k), we must be able to recognise the occurance of the 
right side of production having seen all of what is derived from right side 
with ‘k’ input symbols of look ahead. This requirement for LL (k) grammars 
is quite different, where we must be able to recognise the use of a 
production looking only the first ‘k’ symbols of what its right side derives. 








LR (k) LL (k) 
L stands for left to L stands for left to 
right scanning. right scanning. 
R stands for right L stands for left most 
most derivation. derivation. 




















Turing Machine 


The Turing Machine (TM) was invented by Alan Turing in 1936. Turing 
machines are ultimate model for computers and have output capabilities. 
The languages accepted by Turing machine are said to be recursively 
enumerable. A Turing Machine (TM) is a device with a finite amount of read 
only hard memory (states) and an unbounded amount of read/write tape 
memory. There is no separate input. Rather, the input is assumed to reside 
on the tape at the time when the TM starts running. 


= A UTM is a specified Turing machine that can simulate the behaviour of any TM. 
= A UTM is capable of running any algorithm. 
= For simulating even a simple behaviour, a Universal Turing Machine must have a 


Recursive languages are closed under complementation, union, 
intersection, concatenation and Kleene closure. 
A Turing machine is said to be partially decide a problem, if the following 
two conditions are satisfied 
(i) The problem is a decision problem. 
(ii) The Turing machine accepts as given input if and only if the problem 
has an answer ‘yes’ for the input that is the Turing machine accepts 
the language L. 
A Turing machine is said to be decide a problem, if it partially decides the 
problem and all its computations are halting computations. 
Deterministic algorithms are treated as a special case of 
non-deterministic ones, so we can conclude that P c NP. 
The most famous unsolvable problem in computer theory is whether 


P=NPorP#NP 


A language L is called NP-hard language, if L4 <, L for every L4 € NP, 
where the notation < „ denotes the polynomial time reducibility relation. 


A language L is said to be NP-complete, if L € NP and L is NP-hard. 


Universal Turing Machine (UTM) 


large number of states. If we modify our basic model by increasing the number of 
read/write heads, the number of dimensions of input tape and adding a special 
purpose memory, then we can design a Universal Turing Machine. 
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Definition of Turing Machine 

A Turing Machine (TM) is defined as 7-tuples (Q , £, T, 8, qo, b, F), 

where, Q is a finite non-empty set of states. 

È is a non-empty set of input symbols (alphabets) which is a subset of T 
and b g È. 

T is a finite non-empty set of tape symbols. 

ò is the transition function which maps (Q xT) > (Q xT x {L, R}) 

i.e., mapping from the present state of automata and tape symbol to next 
state, tape symbol and movements of head in left or right direction along 
the tape. 

qo is the initial state and qo € Q. 

b is the blank and b er. 

F is the set of final states and F c Q. 


PEnrEPRnnTEnnTEnTTTTrTTTr 


Transition Function of a Turing Machine 


The transition function Q x T > Q xT x {L, R} states that if a turing machine is in 
some state (from set Q), by taking a tape symbol (from set T ), it goes to some next state 
(from set 8) by overwriting (replacing) the current symbol by another or same symbol 
and the read/write head moves one cell either left (L) or right (R) along the tape. 


Behaviour of Turing Machine 

Depending upon the number of moves in transition, a TM may be 

deterministic or non-deterministic. If TM has at most one move in a 

transition, then it is called Deterministic TM (DTM), if one or more than 

one move, then Non-deterministic TM (NTM or NDTM). 

e Anon-deterministic TM is equivalent to a deterministic TM. 

e Some single tape TM simulates every 2 PDA (a PDA with two stacks). 

e The read only TM may be considered as a Finite Automata (FA) with 
additional property of being able to move its head in both directions 
(left and right). 


Language Recognition by Turing Machine 

TM can be used as a language recogniser. TM recognises all languages, 

regular language, CFL, CSL, Type-0. 

There are several ways an input string might fail to be accepted by a Turing 

machine 

e It can lead to some non-halting configuration from which the Turing 
machine cannot move. 
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e At some point in the processing of the string, the tape head in scanning 
the first cell and the next move specifies moving the head left off the end of 
the tape. 

e In either of these cases, we say that the Turing machine crashes. 


Variation of TM with other Automata 


e Multitape Turing Machine A Turing machine with several tapes is said to 
be a multitape Turning machine. In a multitape Turing machine, each tape 
is controlled by its own independent read/write head. 

e Turing machine with multiple tape is no more powerful that one tape 
Turing machine. 

e Multi-dimensional Turing Machine A Turing machine is said to be 
multi-dimensional Turing machine, if its tape can be viewed as extending 
infinitely in more than one dimension. 

e Multihead Turing Machine A multinead Turing machine can be viewed 
as a Turing machine with a single tape and a single finite state control but 
with multiple independent read/write heads. 

e In one move, the read/write heads may take move independently left, right 
or remain stationary. 

e Offline Turing Machine An offline Turing machine is a multitape Turing 
machine whose input tape is read only (writing is not allowed). 

e An offline Turing machine can simulate any Turing machine A by using one 
more tape than Turing machine A. The reason of using an extra tape is that 
the offline Turing machine makes a copy of its own input into the extra tape 
and it then simulate Turing machine A as if the extra tape were A’s input. 


Halting Problem of Turing Machine 


A class of problems with two output (true/false) is called solvable 
(or decidable) problem, if there exists some definite algorithm which always 
halts (also called terminates), else the class of problem is called 
unsolvable (or undecidable). 


Linear Bounded Automata (LBA) 


It is possible to limit the tape by restricting the way in which the tape can be 
used. The way of limiting the tape use is to allow the machine to use only 
that part of the tape occupied by the input. This way, more space is 
available for long input strings than for short strings, generating another 
class of machines called Linear Bounded Automata (LBA). 
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Recursive and Recursively Enumerable Languages 

A language L is said to be recursively enumerable, if there exists a Turing 
machine that accepts it. A language is recursive if and only if there exists a 
membership algorithm for it. Therefore, a language L on È is said to be 
recursive, if there exists a Turing machine that accepts the language L and 
it halts on every we =*. Recursively enumerable languages are closed 
under union, intersection, concatenation and Kleene closure and these 
languages are not closed under complementation. 


-Key PONTS — 


+ The complement of a recursive language is recursive. 
+ The union of two recursive languages is recursive. 
+ There are some recursively enumerable languages which are not recursive. 


+ IfLis recursive then, L’ is also recursive and consequently both languages are 
recursively enumerable. 


+ Every context senstive language is recursive. 
+ The family of recursively enumerable languages is closed under union. 


+ Ifa language is not recursively enumerable, then its complements cannot be 
recursive. 


+ Ifa languages L is recursive, then it is recursively enumerable language but 
vice-versa is not true. 


Decidable Problems 


A problem is decidable, if there is an algorithm that answer either yes or no 
or we can say that there is a Turing machine that decides the problem. 


Undecidable Problems 


A decision problem is undecidable, if a Turing machine may run forever 
without producing an answer. 


Halting Problems 

A halting problem is undecidable problem. Alan Turing proved in 1936 that 
there is no general method or algorithm which can slove the halting 
problem for all possible inputs. 


Computability and Complexity 
e What can be decided algorithmically is known as computability. 


e What resources (time, memory, communication) are needed is known as 
complexity. 
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Complexity Classes 


e P-Class P is also known as PTIME or DTIME complexity. It contains all 
decision problems which can be solved by a deterministic Turing 
machine using polynomial amount of computation time. 

e NP-Class NP refers to non-deterministic polynomial time. The complexity 
class NP, contains many problems that people would like to solve efficiently, 
but for which no efficient algorithm is known. P is subset of NP. 

e NP-Hard A problem A is said to be NP-hard, if for every decision problem 
Lin NP, there is an Oracle machine with an Oracle for A. 


An Oracle machine is an abstract machine used to study decision 
problems. It can be visualised as a Turing machine with a black box, 
called an Oracle, which is able to decide decision problems in a single 
operation. 

Polynomial Time Reducibility If L4 and L, are two languages over 2, and 
£, respectively, then L4 is said to be polynomial time reducible to L, 
(Li Sp L2), if there is a function f :X;— X, such that for any string x€ £}, 


x€ L, if and only if f (x) € Lə and f can be computed in polynomial time. 
e NP-Completeness An NP-complete Clique problems 


problem is one for which the 
corresponding language is o 
NP-complete. A language L is Erl 


NP-complete, if Le NP and L is Bak NB ièi 
a -complete 
NP-hard. problem ae isomorphism 


NP-completeness 


Some NP-Complete Problems 


Satisfiability problems are given below 

e The vertex cover problem. e The Hamiltonian cycle problem. 
e The travelling salesman problem. œ| The colorability problem. 

e The simple max cut problem. 


Rey Pint Sanaa 


+ The set of all Turing machines, although infinite is countable. 

+ The halting problem of Turing machine is undecidable. 

+ Let M be any Turing machine, then the question of whether or not L(M) is 
undecidable. 

+ The post correspondence problem is undecidable. 


Data Structure with 
Programming in C 


Programming Concepts in C Language 


C is a high level language. It is general as well as specific purpose 
language. It was developed at Bell laboratory, USA (now AT and T) in 
1972, by Dennis Ritchie and Brian Kernighan. 


To learn C language we must first know what alphabets, numbers and 
special symbols are used in C, then how to use them, contents, variables 
and keywords are constructed and finally, how are these combined to form 
an instruction. 


SS SS +S SSS 45558 65 595555 55255 255 25205 5255552 55555 55525555 


Character Set 


The characters that can be used to form words, numbers and expressions 
depend upon the computer on which the program runs. 


The characters in C are grouped into the following categories 


(i) Letters (ii) Digits 
(iii) Special characters (iv) White spaces 
C Tokens 
The smallest individual units are known as C tokens. C has six types of tokens. 
C tokens 


Keywords Identifiers Constants Operators String Special 
symbols 


C tokens classifications 
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Keywords 

All keywords (i.e., reserved words) are basically the sequences of 
characters that have one or more fixed meanings. All C keywords must be 
written in lowercase letters. e.g., break, char, int, continue, default, do etc. 


Identifiers 

Names given to the program elements such as variables, arrays and 
functions. Identifiers are sequences of alphabets and digits e.g., main, 
amount, emp_id etc. 


Constants 
Fixed values that do not change during the execution of a C program. 


Constants 
Numeric constants Character constants 


Integer constants Real constants Single character String constants 
constants 


Constants classification used in C language 


Backslash character constants are used in output functions. e.g., \b’ used 
for backspace and ‘\n’ used for new line etc. 


Operator 
It is symbol that tells computer to perform certain mathematical or logical 
manipulations. e.g., Arithmetic operators (+, —,* ,/) etc. 


String 
e Astring is nothing but an array of characters (printable ASCII characters). 
e Special Symbols e.g., [,], {,} etc. 


-Key Points 


+ An integer and real constant must have at least over digit. 
+ No commas or blanks are allowed within an integer constant and real 
constants. 


Variable 


A variable is used to store data value. A variable is a data name that may 
take different values at different times during execution. 
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Comparison between Valid Variables and Invalid Variables 














S.No. Valid Variables Invalid Variables 

1. | Marks 6 pk 

2. Total_mark Total marks 

3. Area_of_cube Area--of--cube 

4. Num [10] Ram’s-Age 

5. | Population_2006 Gross-salary-2009 

Data Types 
Data Types 
Primary/Built-in Derived/User defined 


Integer Character Floating Double Array String Structure Union 
point 


Data types classification 


-Key Points 


+ An operation between a real and real always yields a real result. 
+ An operation between an integer and real always yields a real result. In this, 
the integer is first promoted to a real and then the operation is performed. 


Delimiters/Separators 
These are used to separate constants, variables and statements e.g., 
comma, semicolon, apostrophes, double quotes and blank space etc. 


Different Types of Modifier with their Range 











S.No. | Types of Modifier | Size (bytes) Range of Values 
I; int 2 — 32768 to + 32767 
2, signed int 2 — 32768 to + 32767 
3: unsigned int 2 0 to 65535 
4. short int 2 — 32768 to + 32767 
5. long int 4 — 2147483648 to + 2147483647 
6. float 4 (3.4 E + 48)to + (3.4 E+ 48) 
7 double 8 —(1.7 E + 308) to (1.7E + 308) 
8. char 1 — 128 to 127 
9. unsigned char 1 0 to 255 
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Flow Chart of Compiling and Running a C Program 


Enter program 
code 
$ Source program 


Edit source 
program 


$ 


C Compile source 
compiler program 























No J Object code 


System Link with 
library system library 


4 Executable object 
Input Execute object | code 
data code 


Data 
errors 

























Logic errors 





Logic and 
data 
errors 












No errors 


Correct output 


E Key Põnn e 


+ Syntax errors (i.e., violation of grammar). 
+ Logical errors (/.e., errors occur during coding process). 


+ Run-time errors (i.e., errors occurs when we attempt to run the ambiguous 
instructions). 


C Flow Control Statements 


Control statement is one of the instructions, statements or group of 
statement in a programming language which determines the sequence of 
execution of other instructions or statements. 

C provides two styles of flow controls 

1. Branching (deciding what action to take) 

2. Looping (deciding how many times to take a certain action) 


Decision Making and Branching 
C has three major decision making instructions as given below 


If Statement 
It takes an expression in parenthesis and a statement or block of 
statements. Expressions will be assumed to be true, if evaluated values are 
non-zero. 
(i) Simple if statement 
Syntax 
if (expression) 


statements; 
} 
(ii) If else statement 
Syntax 
if (test expression) 


{ 


block of statements; 


} 


else 


block of statements; 
} 
(iii) If else-if ladder statement 
Syntax 
if (expression) 
{ 
statements; 


J 
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else if (expression) 


{ 


statements; 


J 


else 


{ 


statements; 


} 


Note C programming language assumes any non zero and non-null values as 
true and if it is either zero or null then it is assumed as false value. 


Flow Diagram of if-else Statement 







if condition 
is true 





if code 


Else code 
a 
The switch Statement 


The switch statement tests the value of a given variable (or expression) 
against a list of case values and when a match is found, a block of 
statements associated with that case is executed. 

switch (expression) 


{ 























case value 1: 
statement 1; 
break; 
case value 2: 
statement 2; 
break; 


default: 
statement; 


} 


Note The break keyword transfers the control out of the switch statement. 
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The Conditional Operators (?, :) 


The ?, : operators are just like an if-else statement except that because it is an operator 
we can use it within expressions. 


? : are a ternary operators in that it takes three values. They are the only ternary 


operator in C language. 
flag = 0; 
else Condition Value which will Value which will 
flag = 1; or be returned, if be returned, if 
expression condition holds condition holds 
true. false. 


if(x <0) |> g= <0 2 oi 


Loop Control Structure 

Loops provide a way to repeat commands and control. This involves 
repeating some portion of the program either a specified numbers of times 
until a particular condition is satisfied. 


There are three types of loop 


while Loop 
initialize loop counter; 


while (test loop counter using a condition/expression) 
{ 
do this; 
and this; 
decrement/increment loop counter; 


, 


for Loop 
for (initialize counter; test counter; 
increment/decrement counter ) 


do this; 
and this; 
and this; 
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do while Loop 
initialize loop counter; 
do 


this; 
and this; 


while (this condition is true); 


The break Statement 


oT 


The break statement is used to jump out of a loop instantly, without waiting 
to get back to the conditional test. A break is usually associated with an ‘if’. 
When break is encountered inside any loop, control automatically passes 


to the first statement after the loop. 
# include <stdio.h> 
void main ( ) 
{ 
int num, i; 
printf (“Enter a number”); 
scanf (“% d”, & num); 
i= 2; 
while (i<=num- 1) 


if (num%i= =0) 
{ 
printf (“Not a prime number”); 
break; 
} 
i+ +; 
} 
if (i= =num) 
printf(“Prime number”); 


} 


Note When num% i turns out to be zero (i.e., num is exactly divisible by i) the 
message “Not a prime number” is printed and the control breaks out of 


the while loop. 


38 Data Structure with Programming in C 


The continue Statement 


The ‘continue’ statement is used to take the control to the beginning of the 
loop, by passing the statement inside the loop, which have not yet been 
executed. 

# include <stdio.h> 

void main ( ) 


{ 
inti, j; 
for (i= 1; i<= 2; i++) 
{ 
for (j = 1; | < = 2; j++) 
{ 
if (ï= =]7) 
continue; 
printf (“\n %d%d”, i, j); 
} 
} 
} 
Output 1 2 


21 


When the value of i equals to j, the ‘continue’ statement takes the control 
to the for loop (inner) by passing the rest of the statements pending for 
executions in the ‘for’ loop. 


goto Statement 


C supports an unconditional control statement, goto, to transfer the control 
from one point to another in a C program. The goto is a branching 
statement and requires a label. 

# include (Stdio. h) 

void main () 





{ 
int i, j, k; 
for (i= 1; i<= 3; i++) 
{ 





for (j= 1; j <= 3; j++) 


for (k= 1; k<= 3; k +) 


{ 
if (i= = 2 &j= = 2 &k= = 2) 
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goto out; 
else 
printf (''%d%d%d \ n'', i,j,k); 
} 
} 
} 
out: 


printf (‘‘out of the loop at last !’’) 
} 


Note The usage of the goto keyword should be avoided as, it usually violets 
the normal flow of execution. 


C Variable Types 


A variable is just a named area of storage that can hold a single value. 
There are two main variable types 


1. Local variable 2. Global variable 


Local Variable 
Scope of a local variable is confined within the block or function, where it is 
defined. 


Global Variable 

Global variable is defined at the top of the program file and it can be visible 
and modified by any function that may reference it. Global variables are 
initialized automatically by the system when we define them. If same 
variable name is being used for global and local variables, then local 
variable takes preference in its scope. 


# include<stdio.h> 

# include<conio.h> 

int i= 4; / *Global definition*/ 

main ( ) 

{ 

i+ +; // This is global variable and will be incremented to 5 
func ( ); 
printf (“value of i =% d... main function \n”, i); 


func ( ) 
{ 
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int i=10; /* Local definition */ 
i+ +; // This is local variable here 
printf (“value of i =%d ... func ( ) function \n”, i); 
J 
This will produce following result 
Value of i =11 ... func () function 
Value of i = 5... main function 


Storage Classes in C 


A variable name identifies some physical location within the computer, 
where the string of bits representing the variable’s value, is stored. 


There are basically two kinds of locations in a computer, where such a value 
may be kept 

(i) Memory (ii) CPU registers 

It is the variable’s storage class that determines in which of the above two 
types of locations, the value should be stored. 

We have four types of storage classes in C 

1. Auto 2. Register 3. Static 4. Extern 


Auto Storage Class 

Features of this class are given below 

e Storage Location Memory 

e Default Initial Value Garbage value 

e Scope Local to the block in which the variable is defined. 

e Life Till the control remains within the block in which variable is defined. 


Auto is the default storage class for all local variables. 
void main ( ) 
{ 
auto int i=3; 
auto int j; 


} 
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Register Storage Class 

Register is used to define local variables that should be stored in a register 
instead of RAM. Register should only be used for variables that require quick 
access such as counters. Features of register storage class are given below 
e Storage Location CPU register 

e Default Initial Value Garbage value 

e Scope Local to the block in which variable is defined. 


e Life Till the control remains within the block in which the variable is 
defined. 
void main ( ) 


{ 
register int i; 


} 
Static Storage Class 


Static is the default storage class for global variables. 
Features of static storage class are given below 

e Storage Location Memory 

e Default Initial Value Zero 


e Scope Local to the block in which the variable is defined. In case of global 
variable, the scope will be through out the program. 


e Life Value of variable persists between different function calls. 


# include<stdio.h> #include<stdio.h> 
void increment ( ); void increment ( ); 
void main ( ) void main ( ) 
{ { 
increment ( ); increment ( ) ; 
increment ( ); increment ( ); 
} Z 
void increment ( ) void increment ( ) 
{ { 
auto int =1; static int i=1; 
printf (“%d \n”, i); printf (“%d \n”, i); 
i=i+ 1; i=i+ 1; 
} } 
Output 1 Output 1 


2 2 
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Extern Storage Class 


Extern is used of give a reference of a global variable that is variable to all 
the program files. When we use extern, the variable can’t be initialized as all 
it does, is to point the variable name at a storage location that has been 
previously defined. 

e Storage Location Memory 

e Default Initial Value Zero 

e Scope Global 

e Life As long as the program’s execution does not come to an end. 


extern int y; * declaration*/ 
int y= 3; * definition*/ 
Features of extern storage class are given below 
File 1 main.c 
int count = 5; 
main ( ) 
{ 
write_extern ( ); 
J 
File 2 write.c 
void write_extern (void); 
extern int count; 
void write_extern (void) 
{ 
printf (“count is %d \n”, count); 


J 


Functions 


A function is a self-contiained block of statements that perform a coherent 
task of some kind. Making function is a way of isolating one block of code 
from other independent blocks of code. 

# include< stdio.h> 


void message ( ); /* Function prototype declaration*/ 
void main ( ) 
{ 
message ( ); / * Function call*/ 
printf (“\n After calling function”); 
i 
void message ( ) /* Function definition * / 
{ 
printf (“\n I am called function”); 
} 


Output I am called function 


After calling function, a function can take a number of arguments or 
parameters and a function can be called any number of times. A 
function can call itself such a process is called recursion. 


Functions can be of two types 
(i) Library functions (ii) User-defined functions 


-Key POINTS iini 


+ Avoids rewriting the same code again and again i.e., ensures reusability. 

* Separating the code into modular functions also makes the program easier to 
design and understand. 

+ The value of each of the actual arguments in the calling function is copied 
into corresponding formal argument of the called function. With this method, 
the changes made to the formal arguments in the called function have no 
effect on the values of actual arguments in the calling function. 


Call by Value 


If we pass values of variables to the function as parameters, such kind of 
function calling is known as call by value. 

#include <stdio.h> 

void swapv (int x, int y); 

void main ( ) // calling function 
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{ 
inta= 10, b= 20; 
// Values of arguments in calling function will be 
passed to the called function. 
swapv (a, b); 
printf (“\na= %db=%d”, a, b); 





} 
void swapv (int x, int y) // called function 
L Formal arguments in the called function 
{ 
int t; 
t=x; 
X= Yi 
y=t; 
printf (“\nx= %dy=%d”, x, y); 
} 
Output x= 20 y = 10 
a= 10 b= 20 


Note The values of a and b remain unchanged even after exchanging or 
swapping the values of x and y. 


Call by Reference 


Variables are stored somewhere in memory. So, instead of passing the 
value of a variable, if we pass the location number/address of the variable 
to the function, then it would become ‘a call by reference’. 


-Key POINTS anna 


+ In call by reference method, the address of actual arguments in the calling 
function are copied into the formal arguments of the called function. 

+ This means that, using these addresses, we would have an access to the 
actual arguments and hence we would be able to manipulate them. 
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#include<stdio.h> 
void swapr (int *, int *); 
void main ( ) 


{ 

inta= 10, b= 20; 

swapr (&a, &b); 

printf (“\na= %db= %d, a,b); 
} 

void swapr (int * x, int *y) 
{ 


int t; 
t = *x; // Hear, we are assigning value at x to 
the variable t 
*x=*y; 
*y=t; 
} 


Previous to Swapping 
a b 

10| | 20 

100 200 


In swapr (&a, &b); we are passing addresses of a and b i.e, 100 and 
200. Which are being taken by the pointer variables x and y. Now, we 
are swapping the value at x to the value at y. That’s why when we are 
getting the output it will be as follows 


Outputa=20 b=10 



































After Swapping 
a b 
20 10 
100 200 


This program manages to exchange the values of a and b using their 
addresses stored in x and y. 





Pointers 


A pointer is a variable that stores memory address. Like all other variables, 
it also has a name, has to be declared and occupies some spaces in 
memory. It is called pointer because it points to a particular location. 
Consider the declaration 


int i = 9; this tells the C compiler to 


e Reserve space in memory to hold the integer value. 
e Associate the name i with memory location. 
e Store the value i at this location. 


i —> Location name 











9| —> Value at location 





12354 —> Location number or address 
# include <stdio.h> 
void main ( ) 


{ 
inti= 9; 
printf (“\n Address of i =%u”, &i); 
printf (“\n Value of i=%d”, *(&1i)); 
} 


‘&’ = Address of operator 
‘*” = Value at address operator or ‘indirection’ operator 
&i returns the address of the variable i. 


*(&i ) return the value stored at a particular address printing the value of 
*(&i) is Same as printing the value of i. 


“Key Ponts 2s 


+ If we want to assign some address to some variable, then we must ensure that 
the variable which is going to store some address must be of pointer type. 
+ If we want to assign &i to j i.e. j = &i; then we must declare j as int *j; 


e This declaration tells the compiler that j will be used to store the address 
of an integer value or j points to an integer. 


int * j; means the value at the address contained in j is an integer. 


Pointers are variables that contain addresses and since addresses are 
always whole numbers. Pointers would always contain whole numbers. 
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inti= 3; i j 
int *j; [3] [65524 
j=&1; 65524 65522 
i’s value is 3 and j’s value is i’s address. 
printf (“ \n Address of i= %u”, &1); 
printf (“\n Address of i= %u”, j); 
Output Address of i= 65524 
Address of i= 65524 
printf (‘’/n Address of j = %u’’, &j); 
printf (‘’ /nvalue of | = %u’’,j); 
Output Address ofj = 65522 
Value of j = 65524 
printf (“\n Value of i= %d”, *(&i)); 
printf (“\n Value of i= %d”, *j); 
Output Value of i= 3 
Value of i= 3 














Array 


It is a collection of similar elements (having same data type). For an array 
of 100 elements, the first element’s index is zero ‘0’ and the last index will 
be 99. This indexed access makes it very convenient to loop through each 
element of array. Array elements occupy contiguous memory locations. 



































0 1 2 3 4 5 6 7 Index 
a b Ê d e f g h Values 
100 101 102 103 104 105 106 107 Memory location 
A [0] =a 
A[1]=b 
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Multi Dimensional Array 


In C language, one can have arrays of any dimensions. Let us consider a 
3x 3 matrix. 
0 1 2 Column number [j] 
Row number [i] O |@ 9] 9.4/0 2 








1 | ayo | ayy | Aye 





2 [az o]|aņz1|az 2 

















3 x 3 matrix for multi dimensional array 


To access the particular element from the array, we have to use two 
subscripts; one for row number and other for column number. The notation 
is of the form a [i] [j], where i stands for row subscripts and j stands for 
column subscripts. 


We can also define and initialize the array as follows 
int values [3] [4] = { 
{1, 2, 3, 4,} 
{5, 6, 7, 8} 
{9, 10, 11, 12} 
F 
OR a [i] 


int values [3] [4] p — 


S125 4,5:6, 76.0102) Gee Subeorpi 
A simple program using array 
# include <stdio.h> 
void main ( ) 
{ 
int ave, sum=0; 
int i; 
int marks [10]; // array declaration 
for (i=0; i<= 9; i+ +) 
{ 
printf (“\n Enter marks”); 
scanf (“%d”, &marks [i]); // store data in array 
i 
for (i= 0; i<= 9; i+ +) 
sum = sum + marks [i]; avg = sum/30; 
printf (“\n avg-marks = %d”, avg); 








} 


Strings 


In C language, strings are stored in an array of character (char) type along 
with the null terminating character “\O” at the end. 
char name [ ] ={ ‘kK’, ‘R’, T, ‘S’, ‘H, ‘N’, ‘A’, ‘\0’}; 
‘\0' = Null character whose ASCII value is 0. 
‘0’ =ASCIl value is 48. 
In the above declaration ‘\O’ is not necessary. C inserts the null character 
automatically. 
# include <stdio.h> 
void main ( ) 
{ 
char name [ ] = “RAM”; 
printf (“%s”, name); 
} 
% S = It is used in printf ( ) as a format specification for 
printing out a string. 
All the following notations refer to the same element 
name [i] 
*(name + i) 
* (i+ name) 
q [name] 
# include <stdio.h> 
void main ( ) 
{ 
char name [ ] = “shyam”; 
char * ptr; 


ptr =name; / *store base address of string */ 
while (*ptr ! = ‘\0’) 
i 
printf (“%c”, *prt); 
ptr + +; 
} 
} 


Note The above program is used to print all the characters of an string using 
pointer. 
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Length of String 


Use strlen () function to get the length of a string minus the null terminating 
character. 


Syntax int strlen (string); 


Concatenation of String 
The strcat( ) function appends one string to another. 
Syntax char * strcat (string 1, string 2); 


Copy String 
To copy one string to another string variable, we use strcpy( ) function. 
Syntax strcpy (string 1, string 2); 


Structures 


Structures in C are used to encapsulate or group together different data 
into one object. We can define a structure as given below 


struct object 
{ 
char id [20]; 
int rollno; 
int marks; 
3; 


The variables we declare inside the structure are called data member. 


Initializing a Structure 


Structure members can be initialized when we declare a variable of our 
structure. 


Sample Code 
struct object student1 = {“Ram’”, 1, 95}; 


The above declaration will create a struct object called student1 with an id 
equal to “Ram”, rollno equal to 1 and marks equal to 95. To access the 


members of a structure, we use the “.” (dot operator) /.e., scope resolution 
operator. 
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struct object student; 
student1. id = “Ram” 
student1. roll no = 1; 
student1. marks = 95; 


Key PONTS ann 


+ 


The values of a structure variable can be assigned to another structure 
variable of the same type using assignment operator. 


One structure can be nested within another structure. Using this facility, 
complex data types can be created. 
e.g., struct emp 

{ 

char name [100]; 

struct Address a; 

1; 
A structure variable can also be passed to a function. We may either pass 
individual structure elements or the entire structure variable at one go . 


We can have a pointer pointing to a struct. Such pointers are known as 
structure pointers. 


Unions 


A union is a collection of hetrogeneous elements that is, it is a group of 
elements which are having different data types. Each member within a 
structure is assigned its own memory location. But the union members, all 
share a common memory location. Thus, unions are used to save memory. 


union person 


{ 
char name [20]; Union definition 
int age; 
float height; 

J; 


Like structures, unions are also defined and declared. 





union person Ram; // Union declaration 


Data Structure 


A data structure is a specialised way for organising and storing data in 
memory, so that one can perform operations on it. 


Data structure is all about 

e How to represent data element(s). 

e What relationship data elements have among themselves. 
e How to access data elements /.e.,access methods 


Data Structures) + = 


Types of Data Structure 


Data Structure 


Primitive Non-primitive 
Integer Float Character Pointer Arrays Files Lists 
Linear Non-linear 
Stack Queues Trees Graphs 


Data structure classification 


Operations on Data Structures 
The operations involve in data structure are as follows 
Create Used to allocate/reserve memory for the data element(s). 


Destroy This operation deallocate/destroy the memory space assigned to 
the specified data structure. 


Selection Accessing a particular data within a data structure. 
Update For updation (insertion or deletion) of data in the data structure. 


Searching Used to find out the presence of the specified data item in the 
list of data item. 


Sorting Process of arranging all data items either in ascending or in 
descending order. 


Merging Process of combining data items of two different sorted lists of 
data items into a single list. 


Stack 


A stack is an ordered collection of items into which new items may be 
inserted and from which items may be deleted at one end, called the TOP 
of the stack. 


It is a LIFO (Last In First Out) kind of data structure. 


e.g., If a person wants to put a plate on the rN S ame 
pile of plates, he has to place it on the top PUSH POP 
(this is how stack grows) and if he want to L 


take a plate from the pile of plates he has 


to take it from the top (this is how stack 
shrinks). 


A LIFO diagram 
Operations on Stack 
There are two operations including in stack as given below 
e PUSH (when an item is added to stack) 
PUSH (s, /); Adds the item / to the top of stack. 
e POP (when an item is removed from stack) 
POP (s); Removes the top element and returns it as a function value. 


Key PONTS nnn 
* Because of the PUSH operation which adds elements to a stack is sometimes 
called a pushdown list. 


+ Ifa stack contains a single item and the stack is popped, then the resulting 
stack contains no items and is called empty stack. 


+ The result of an illegal attempt to POP or access an item from empty stack is 
called underflow. 


+ Underflow can be avoided by ensuring that the stack is not empty. 


Implementation of Stack 
A stack can be implemented using two ways 
1. Array 2. Linked list 
But since array sized is defined at compile time, it can’t grow dynamically. 
Therefore, an attempt to insert/oush an element into stack (which is 


implemented through array) can cause a stack overflow situation, if it is 
already full. 
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So, to avoid the above mentioned problem we need to use linked list to 
implement a stack, because linked list can grow dynamically and shrink at 
runtime. We need to have an additional pointer (TOP) that will always point 
to the first node in the stack or the top of the stack (in case of linked list 
implementation of stack). 


Applications of Stack 


There are many applications of stack some of the important applications are 
given below 


Function Calls 

Different ways of organising the data are known as data structures. The 
compiler uses one such data structure called stack for implementing 
normal as well as recursive function calls. 


# include <stdio.h> 


int add (int, int); [5 |Copyofa 
void main ( ) aa of b 























{ g Empty When call to 
inta = 5,b = 2, C; stack add () is met 
c =add (a, b); m 
printf (‘‘sum=%d’’, c); XXXX 7___|Sum 
} 5 | Addressof | *XXX 
: : ew i rintf i 5 
int add (int i, int j) 2 pe i) 2 
{ Before transferring After control 
int sum control to add () reaches to add () 
T 
sum= i + j; [> 
return sum; KARK 
} 5 
2 At last stack is empty 
While returning again on returning 





control from add () control from add () 
Expression Evaluation 
How a stack can be used for checking on syntax of an expression. 
e |nfix expression It is the one, where the binary operator comes between 
the operands e.g., A+ B*C. 
e Postfix expression Here, the binary operator comes after the operands. 
e.g., ABC * + 
e Prefix expression Here, the binary operator proceeds the operands. 
e.g.,+ A* BC 


Handbook Computer Science & IT 55 


This prefix expression is equivalent to A + (B * C) infix expression. 
Prefix notation is also known as Polish notation. 

Postfix notation is also known as suffix or Reverse Polish notation. 
Operator precedence 

Exponential operator ~ has highest precedence. 
Multiplication/Division *,/ operators have next precedence. 
Addition/Subtraction +, — operators have least precedence. 


Reversing a List 
First push all the elements of string in stack and then pop elements. 


Queue 


It is a non-primitive, linear data structure in which elements are 
added/inserted at one end (called the REAR) and elements are 
removed/deleted from the other end (called the FRONT). A queue is 
logically a FIFO (First In First Out) type of list. 


e.g., Consider a line of persons at a railway reservation counter. Whenever 
a person enter the queue, he stands at the end of the queue (addition of 
the nodes to the queue). 


Note Every time the person at the front of the queue deposit the fee for ticket, 
he leaves the queue (deleting nodes from a queue). 


Queue Implementation 

Queue can be implemented in two ways 

e Static implementation (using arrays) 

e Dynamic implementation (using pointers) 


Key PONTS nanan 


+ In static implementation, we have to declare the size of the array at design 
time or before the processing starts. 

+ Implementing queues using pointers, the main disadvantage is that a node in 
a linked representation occupies more memory space than a corresponding 
element in array representation. 
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Beginning of The last location of the 
array is Front array will act as Rear 


Total number of elements present in the queue = (Front — Rear + 1) 


Circular Queue 


In a cirular queue, the first element comes just after the last element or a 
circular queue is one in which the insertion of a new element is done at the 
very first location of the queue, if the last location of queue is full and the 
first location is empty. 


Note A circular queue overcomes the problem of unutilised space in linear 
queues implemented as arrays. 


We can make following assumptions for circular queue 
e Front will always be pointing to the first element (as in linear queue). 
e |f Front = Rear, the queue will be empty. 


e Each time a new element is inserted into Q [0] gm] 
the queue, the Rear is incremented by 1. 
Rear = Rear + 1 
Q [4] Q [2] 


e Each time, an element is deleted from the 
queue, the value of Front is incremented by 
one. Q [3] 


Front = Front + 1 Circular queue 


Double Ended Queue (DEQUE) 

It is a list of elements in which insertion and deletion operations are 
performed from both the ends. That is why it is called double-ended 
queue or DEQUE. 

There are two types of DEQUE, because of the restrictions put to perform 
either the insertion or deletion only at one end. 

e Input restricted DEQUE 

e Output restricted DEQUE 


FRONT REAR 
| oan bar l 
~<—— Insertion 


< [s [ro] 2025130] Bertion 
Dajo] Da2] Dajal 





Deletion 
Insertion 
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Operations on DEQUE 

There are following operations on DEQUE 

e Insertion of an element at the REAR end of the queue. 

e Deletion of an element from the FRONT end of the queue. 
e Insertion of an element at the FRONT end of the queue. 

e Deletion of an element from the REAR end of the queue. 


-Key Points 


+ For an input restricted DEQUE all of the above four operations are valid. 
+ For output restricted DEQUE only 1,2 and 3rd opertions are valid. 


Priority Queues 

This type of queue enables us to retrieve data items on the basis of priority 
associated with them. Below are the two basic priority queue choices 
Sorted Array or List It is very efficient to find and delete the smallest 
element. Maintaining sortedness make the insertion of new elements slow. 
Binary Heaps Here, the minimum key (in min Heap) is always at the root of 
the heap. Binary heaps are the right choice whenever we know an 
upperbound on the number of items in our priority queue. Since, we must 
specify the array size at compile time. 


Applications of Priority Queues 

e Round Robin (RR) technique for processor scheduling is implemented 
using queues. 

e All types of customer service @.q., railway reservation) center softwares 
are designed using queues. 

e Printer server routines are designed using queues. 


Linked List 


It is a special data structure in which data elements are linked to one 
another. Here, each element is called a node which has two parts 
e Info part which stores the information. 
e Address or pointer part which holds the address of next element of same 
type. 
Linked list is also known as self referential structure. 
struct linked_list_node 


{ 
int info; 
struct linked_list_node *next; 
3; 
Syntax of declaring a node which contains two fields in it one is for 


storing information and another is for storing address of other node, so 
that one can traverse the list. 


info | address info x 


Linked list process flow diagram 





Note Last node of a linked list contains NULL value in its address field. 


Advantages of Linked List 


e Linked lists are dynamic data structure as they can grow and shrink 
during the execution time. 


e Efficient memory utilisation because here memory is not pre-allocated. 
e Insertions and deletions can be done very easily at the desired position. 


Disadvantages of Linked List 
e More memory is required, if the number of fields are more. 
e Access to an arbitrary data item is time consuming. 
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Types of Linked List 
These are different types of linked list as given below 
Singly Linked List 


In this type of linked list, each node has only one address field which points 
to the next node. So, the main disadvantage of this type of list is that we 









































can’t access the predecessor of node from the the current node. 
Start 
| >{20] [30] x 
Singly linked list 
Doubly Linked List 


Each node of linked list is having two address fields (or links) which help in 
accessing both the successor node (next node) and predecessor node 
(previous node). 


T Prev Data Next oT 
X 110 30| X 


Prev data next Prev data next 
Doubly linked list 



























































Circular Linked List 
It has address of first node in the link (or address) field of last node. 
Start 
| >| 10] « 20| « >| 30] o e 
i r j 
Circular linked list 
-Key Points = eeens 


+ Null Pointer Address field of the last node contains NULL value to indicate 
the end of list. 
* External Pointer (Start Node) Pointer to the very first node of a linked list, it 
enables us to access the entire linked list. 
+ Empty List If the nodes are not present in the list, it is empty list or NULL list. 
The value of the external pointer/start pointer will be NULL for an empty list. 
Start = NULL; 
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Circular Doubly Linked List 
It has both the previous and next pointer in circular manner. 


Start Last 
o] AF [20| >f [80 2 


Circular doubly linked list 



































Operations on Linked Lists 
The following operations involve in linked list are as given below 


Creation 
Used to create a linked list. 


Insertion 

Used to insert a new node in linked list at the specified position. A new 
node may be inserted 

e At the beginning of a linked list 

e At the end of a linked list 

e At the specified position in a linked list 

e Incase of empty list, a new node is inserted as a first node. 


Deletion 

This operation is basically used to delete as item (a node). A node may be 
deleted from the 

e Beginning of a linked list. 

e End of a linked list. 

e Specified position in the list. 


Traversing 
It is a process of going through (accessing) all the nodes of a linked list 
from one end to the other end. 


Memory Allocation: Garbage Collection 
in Linked List 

There should be some mechanisms 

e Which provides unused memory space for new nodes. 


e Which makes available the memory space of deleted nodes for future 
use. 
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Situation of Overflow and Underflow 


Suppose, someone wants to insert new data into a data structure but there 
is no available space /.e., the free storage list is empty. This situation is 
usually called OVERFLOW. The situation, where one wants to delete data 
from a data structure that is empty is referred as UNDERFLOW. 


lf START = NULL and someone still wants to delete data, then 
UNDERFLOW situation occurs. 


Start ra Data List 








Node A Node B 














AVAIL 


-Key POINTS iirinn 
+ Together with the linked lists in memory, a special list is maintained which 
consists of unused memory cells. 


+ The special list list has it's own pointer and is called the “List of Available 
Space ” or the “Free Storage List” or the “Free Pool”. 


Insertion of a Node at Specific Position in a Linked List 
The next pointer field of Node A now points to the new Node N, to which 
AVAIL previously pointed. AVAIL now points to the second node in the free 
pool, to which Node N previously pointed. 
The next pointer field of Node N now points to Node B to which Node A 
previously pointed. Suppose, we are given the value of LOC, where either 
LOC is the location of a Node A in a linked list or LOC = NULL. 

N = New node whose location is NEW 


We need to insert ITEM into list, so that ITEM follows Node A or when 
LOC = NULL, so that ITEM is first node. If LOC is not NULL, then we let 
Node N point to Node B (which originally followed Node A) by the 
assigment. 

LINK [NEW] = LINK [LOC] 
and we let Node A point to the new Node N by the assignment. 


LINK [LOC] = NEW 
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Deletion from a Linked List 


Start Data list 
Node A Node N Node B 
SOT A THOE] 




















In the above diagram, the three pointer fields are changed as follows 

e The next pointer field of Node A now points to Node B, where Node N 
previously pointed. 

e The next pointer field of N now points to the original first node in free pool, 
where AVAIL previously pointed. 

e AVAIL new points to the deleted Node N. 


Polynomials (An Application of Linked Lists) 

A polynomial p(x) =2x° — 3x* — 9x? + 4, may be represented by list, where 
each node corresponds to a non-zero term of p(x). 

The information part of the node is divided into two fields representing, the 
coefficient and the exponent of the corresponding term, respectively. All 
nodes are linked according to decreasing degree. 


Poly Coefficient of term 


ar Exponent of term 

oH lls Hk 4+>69[2] Aao] 
Header 
node 











Parts of polynomial 


Tree (Non-linear Data Structures) 


Trees are used to represent data containing a hierarchical relationship 
between elements e.g., records, family trees and table contents. 


Height = Number of levels = 4 
Root Height = Max level number +1 
=3+1 
=4 Level 0 





Level 1 


A tree diagram 


For above tree 
Number of level = 4 
Max level number = 3 
e Node Each data item in a tree. 
e Root First or top data item in hierarchical arrangement. 
e Degree of a Node Number of subtrees of a given node. 
e.g., Degree of A = 3, Degree of E =2 
e Degree of a Tree Maximum degree of a node in a tree. 
e.g., Degree of above tree = 3 
e Depth or Height Maximum level number of a node + 1(/.e., level number 
of farthest leaf node of a tree + 1). 
e.g, Depth of above tree =3+1=4 
e Non-terminal Node Any node except root node whose degree is not zero. 
e Terminal Node or Leaf Nodes having degree = 0 
e Forest Set of disjoint trees. 
e Siblings D and G are siblings of parent Node B. 
e Path Sequence of consecutive edges from the source node to the 
destination node. 


e.g., Path between A and K comprises (A, E), (E, J) and (J, K) node pairs. 
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Binary Tree 

In this kind of tree, the maximum degree of any node is at most 2. 
A binary tree T is defined as a finite set of elements such that 

e T is empty (called NULL tree or empty tree). 


e T contains a distinguished Node R called the root of T and the remaining 
nodes of T form an ordered pair of disjoint binary trees 74 and 75. 


A —— Root node 


Left s 
successor Right 
of node A successor 
of node A 
Right subtree 
Left subtree of the root A 


of the root A 
Successor and subtree diagram 


Any node N in a binary tree T has either 0, 1 or 2 successors. Level r of a 
binary tree T can have at most 2’ nodes. 


Extended Binary Trees : 2-Trees or Strictly 
Binary Trees 


If every non-terminal node in a binary tree consist of non-emtpy left subtree 
and right subtree. In other words, if any node of a binary tree has either O or 
2 child nodes, then such tree is known as strictly binary tree or extended 
binary tree or 2- tree. 


(A) 
Each node of 
this t tai 
ie SO —> seo 
child nodes 
©) © 
©) © 


Extended binary tree 
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Complete Binary Tree 

A binary tree is one which have the following properties 

e Which can have 0, 1 or 2 nodes as a child node. 

e In which first, we need to fill left node, then right node in a level. 


e In which, we can start putting data item in next level only when the 
previous level is completely filled. 


If all the above three conditions are fullfilled by any binary tree, then we 
can say that as a complete oe tree. 


eo, an 
ra \, / y Level 2 
/ a -e 


9 10 11 12 Level 3 
Complete binary tree 


Now, if we want to add new item in above tree, then we need to add it as 
right child of node 6. We can’t add it anywhere else. (According to 
condition 2 and 3). 


Tree Traversal 


Three types of tree traversal are given below 

1. Preorder (Root, Left subtree, Right subtree) 
2. Postorder (Left subtree, Right subtree, Root) 
3. Inorder (Left subtree, Root, Right subtree) 


The detailed description of different types of tree traversal are given below 


~ 


‘ON 


Left subtree 


Root A), Right subtree 
comes comes last 
first 


A» [BIDJE]: [c]F] 


Preorder in tree traversal 


Preorder 
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Postorder 
Left subtree Right subtree 
comes first 
(a}— Root comes 
í i in the last 
[B|D[E]. A 
Postorder in tree traversal 
Inorder 
Root node comes 
Left subtree in between left and 
comes first right subtree 


Right subtree 
comes in 
, , the last 


[BIPIE]; a [clr] 


Inorder in tree traversal 


Note In all of the above three techniques, we need to decompose left and 
right subtree, according to respective rules only. 


Breadth First Traversal (BFT) 


The BFT of a tree visits all the nodes in the order of their depth in the tree. 
BFT first visits all the nodes at depth zero (/.e., root), then all the nodes at 
depth 1 and so on. At each depth, the nodes are visited from left to right. 


“> Result of BFT =A, B, C , D, E, F 





BFT diagram 


Depth First Traversal (DFT) 


In DFT, one starts from root and explores as far as possible along each 
branch before backtracking. 


A) 


B) (C) —> Result of =A, B, D,E, C, F 


DFT 
\\ 
8% ©) 


A DFT diagram 
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Perfect Binary Tree or Full Binary Tree 


A binary tree in which all leaves are at the same level or at the same depth 
and in which every parent has 2 children. 





EE Level 0 


Aaa Level 1 


----Level 2 


Full binary tree 


Here, all leaves (D, E, F, G) are at depth 3 or level 2 and every parent is 
having exactly 2 children. 


-Key POINTS iiini 


+ BFT uses queue for traversing. 
+ DFT uses stack for traversing. 


Some Important Formulas of Binary Tree 
Number of nodes ‘n’ in a perfect binary tree can be found using the following 
formula 
n=2h _4 
Where, h = Height of perfect binary tree i.e., number of levels in tree or 


(maximum level number + 1). The number of nodes ‘n’ in a complete binary 
tree is atleast (2"~ ') and at most (2” — 1). 


In a complete binary tree, if we want to find out children and parent of any 
node k, use following formula 


Left child of the node k =2 * k (1) 
Right child of the node k =(2 * k) +1 

Parent of the node k =| k/2 | O0 © 
Left child of node 2 =2*2=4 

Right child of node 2 = (2 * 2) +1=5, here, k =2 Q ©) 


Bi t 
Parent of node 2 =| 2/2 =1 inary tree 


The depth d, of the complete binary tree 7,, with n nodes is given by 





d, =[logon+1| 
Suppose, any complete tree 7, has 8 nodes, then 
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d, =|l09; 8+1|=| logs2*+1|=| 31093 +1 | 
d, =|3+1]=4 
The number of nodes in a perfect binary tree can also be found using this 


formula n =2L —1, where L is the number of leaf nodes in the tree. The 
number of null links in a binary tree of ‘n’ nodes is equal to (n + 1). 





Binary Search Tree (BST) 


A binary tree T, is called binary search tree (or binary sorted tree), if each 
node N of T has the following property. The value at N is greater than every 
value in the left subtree of N and is less than or equal to every value in the 
right subtree of N. 






Every value in< 38 <Every value in 


left subtree right subtree 
of 38 of 38 
Left subtree Right subtree 
of 38 of 38 
Binary search tree 
AVL-Tree 


The disadvantage of a BST is that if every item which is inserted to be next 
is greater than the previous item, then we will get a right skewed BST or if 
every item which is to be inserted is less than to the previous item, then we 
will get a left skewed BST. 


MZ 


aa cis Left skewed 
BST 
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So, to overcome the skewness problem in BST, the concept of AVL- tree or 
height balanced tree came into existence. 


A non-empty binary tree T is an AVL-tree if and only if 
|A(T,)-A (Tp) | <1 
where, /(7,) =Height of left subtree 7, of tree T 
h (Tz) = Height of right subtree 7, of tree T 


h (T,) —h (Tp) is also Known as Balance Factor (BF). For an AVL (or 
height balanced tree), the balance factor can be either 0,1 or —1. An AVL 
search tree is binary search tree which is an AVL-tree. 


h(T, ofC)=1 -1 
h(TpofC)=2 (C) 
BF of C =h(T, fC) -h (Tp ofC) 0a) O 
C12 


Key PONTS ann 


+ In the AVL-tree, every node should have the value of BF either 0 or 1 or- 1. 
+ Ifthe value of BF is not falling in this criteria, then that tree is not an AVL-tree. 


m-way Search Tree 


To favour retrieval and manipulation of data stored in external memory viz. 
storage devices such as disks etc., there is a need for some special data 
structures. e.g., such data structures as m-way search trees. B-Trees and 
B*-trees. An m-way search tree T may be an empty tree, if T is non-empty, it 
satisfies the following properties 

e For some integer m, known as order of the tree, each node is of degree 
which can reach a maximum of m, in other words, each node has at most 
m child nodes. 

e A node may be represented as Ao, (k4, A4), (k2, Ao), ... (Am — 1; Am — 1); 
where k;, 1</ <(m —1) are the keys and A;, 0 <i <(m —1) are the pointers 
to subtrees of T. 

e If anode can have k child nodes, where k < m, then the node can have 
only (k —1) keys k4, Ko,...k, _ 4, contained in the node such that k; <k; , ; 
and each of the keys partitions all the keys in the subtrees into k subsets. 
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e For a node Ap, (ky, A1) (Ka, Ap), (Km- 19 Am-1)} all key values in 
the subtree pointed to by A; are less than the key k;,, O<i<m-2 
and all key values in the subtree pointed to by Am- , are greater than 
Km-t 

e Each of the subtrees Ą;, 0 <i <m —1are also m-way search trees. 

e Below is an example of a 5-way search tree. Here, m = 5 which means 
each node can have at most m = 5 child nodes and therefore has atmost 
(m —1=4) keys contained in it. 


Ki Ky Kg K,Node 0 


Keys 
NULL 18 | 44 76 |198{ 
pointer 
xX | X —— Pointer to subtree 


Ao At Ao |A Ay 























80 | 92 |141 


272[286ļ350] 
x|x|x]x 


5-way search tree 
























































In the above diagram, we can see that k, < ks < ka < k4 for each node. 


B-Tree 


m-way search trees have the advantage of minimising file accesses due to 
their restricted height. But still, we need to keep the height of m-way search 
tree as low as possible and therefore the need to maintain balanced m-way 
search trees arises. 
So, a B-tree is nothing but a balance m-way search tree. A B-tree of 
order m, if non-empty, is an m-way search tree in which 

e The root has atleast 2 child nodes and at most m child nodes. 

e The internal nodes except the root have atleast H child nodes and at 
most m child nodes. 

e The number of keys in each internal node is one less than the number of 
child nodes and these keys partition the keys in the subtrees of the node 
in a manner similar to that of m-way search trees. 

e All leaf nodes are on the same level. 
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A B-tree of order 5 is referred to as 4- 5 tree, since the internal nodes 
are of degree 2 or 3 only. 


Key fields 


| | 4 Pointer fields 











56 | 64 | 85 























49] 5152 [58]62] _[67[75] Sa 
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x] x[x] [x] x] x] [x [x] x] [x] x] x]x 
B-Tree 





























B* Trees 


= It is a variant of B-tree in which all records are stored in the leaves and all leaves 
are linked sequentially. 

= A B*tree consists of one or more blocks of data, called nodes linked together by 
pointers. 

= Internal nodes point to other nodes in the tree and leaf nodes point to data in 
database using data pointers. 


= In aB*-tree, in contrast to a B -tree, all records are stored at the lowest level of the 


Graph 


A graph is a set of vertices and edges which connects them. A graph G 
consists of two things 
1.A set of vertices V. 
2. A set of edges E such that edge e in E is identified with a unique pair 
u, v of vertices in V, denoted bye = {u, v} 

e Path A path P of length n from a node u to a node v is defined as a 
sequence of (n + 1) nodes. 

P =(Vo,V4, Va, +++) Vn) 

e Adjacent Nodes or Neighbours Suppose e =[u,v], then the nodes u 
and v are called the endpoints of e andu and v are said to be adjacent 
nodes or neighbours. 

e Degree of a Node The number of edges containing u, is the degree of a 
node deg (u). If deg(u) = 0, this means u doesn’t belong to any edge, then 
u is called as isolated node. 

e Connected Graph A graph G is said to be connected, if there is a path 
between any two of its nodes. 

e Complete Graph A graph G is said to be complete, if A 
every nodeu inG is adjacent to every other node v inG. 


{Number of edges with n nodes in a complete graph G B C 
n(n-1) 
2 
Graph 
e Tree Graph or Free Tree/Tree A connected graph T 
without any cycles is known as tree graph or free tree or B 
simply a tree. A 
If T is a finite tree with k nodes, then 7 will have (k —1) D c 
edges. E 
e Labelled and Weighted Graph A graph G is said to be Tree 


labelled, if its edges are assigned data. G is said to be 
weighted, if each edgee in G is assigned a non-negative numerical value. 


Handbook Computer Science & IT 13 


w (e) =Weight or length of edge e 


B 
4 3 
A Cc 
3 3 
D E 


6 
Weighted Graph 


Multigraph If a graph is having multiple edges and/or ^A 
loops, then it will be called a multigraph. 


Multiple Edges Distinct edges e and e’ are called multiple edges, if they 
connect the same endpoints /.e., ife [u, v]ande’ [u,v] 

Loops An edge e is called a loop, if it has identical start pont and end 
point i.e., ife =[u, w] 

Directed Graph (Digraph) 

Each edge in graph G is assigned a direction or each edge e is identified 
with an ordered pair (u, v) of nodes G rather than an unordered pair [u, v} 


~ Origin or e Destination or 
initial point of e <—u © > © Y—> terminal point of e 





u is predecessor of v and v is a successor of u or neighbour ofu. 

u is adjacent to v and v is adjacent to u. 
e QOutdegree of anodeu in G 

outdeg (u) = Number of edges beginning atu 
e Indegree of anodeu in G 
indeg (u) = Number of edges ending at u 

e Anodeu is called source, if it has positive outdegree but zero indegree. 
e Anodeu is called sink, if it has a zero outdegree but a positive indegree. 
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e.g., A 5 


B C 
indeg (c)=2; outdeg (c) =0 
So, C is a sink and there is no source node of this graph. 


Adjacency Matrix 


| Suppose G is a simple directed graph with m nodes and suppose the nodes of G have 
| been ordered are called v}, v3,..., Vm: 
| Then, the adjacency matrix (A) = (aij ) of the graph G is the m x m matrix defined as 
| follows 

7 1 if v; is adjacent to Vir i.e., if there is an ege (v;, vj) 

l a f Otherwise 


Such a matrix which contains entries of on by O and 1, is called a bit matrix or a 
Boolean matrix. 

The adjacency matrix A of the graph G does depend on the ordering of the nodes of G; 
i.e., a different ordering of the nodes may results in a different adjacency matrix. 


Design and Analysis 
of Algorithms 


Introduction to Analysis of Algorithms 


An algorithm is a finite set of instructions that, if followed, accomplishes a 
particular task. Informally, an algorithm is any well defined computational 
procedure that takes some value or set of values as input and produces 
some value or set of values as output. Thus, an algorithm is a Sequence of 
computational steps that transform the input into the output. 


There are some terms related to the algorithms are given as follows 


Computational Problem 


A computational problem is a specification of the desired input-output 
relationship. It focuses on classifying computational problems according to 
their inherent difficulty and relating those classes to each other. 


Instance 


An instance of a problem is all the inputs needed to compute a solution to 
the problem. A computational problem can be viewed as an infinite 
collection of instances together with a solution for every instance. 


Algorithm 


An algorithm is a well defined computational procedure that transforms 
inputs into outputs, achieving the desired input-output relationship. 
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Correct Algorithm 


A correct algorithm halts with the correct output for every input instance. 
We can say that the algorithm solves the problem. 


Analysis of Algorithms 


Analysis of algorithm depends upon various factors such as memory, 
communication bandwidth or computer hardware. The most often used for 
analysing of algorithms is the computational time that an algorithm requires 
for completing the given task. By counting the number of steps, the runtime 
of an algorithm is measured. 


Complexity 


Algorithm can be classified by the amount of time they need to complete 
compared to their input size. 


The analysis of an algorithm focuses on the complexity of algorithm. 
we complexity 
PN N 


Space complexity 


Time Complexity 

The time complexity is a function that gives the amount of time required by 
an algorithm to run to completion. The time complexity quantifies the 
amount of time taken by an algorithm to run as a function of the length of 
the string representing the input. 


Space Complexity 
The space complexity is a function that gives the amount of space required 
by an algorithm to run to completion. 


‘Key Points ss aa 


+ We usually refer the time complexity in three terms 
(i) Best case time complexity 
(ii) Average case time complexity 
(iii) Worst case time complexity 

+ Worst case time complexity It is the function defined by the maximum 
amount of time needed by an algorithm for an input of size n. 

+ Average case time complexity It is the execution of an algorithm having 
typical input data of size n. 

+ Best case time complexity It is the minimum amount of time that an 
algorithm requires for an input of size n. 
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Asymptotic Analysis 
This can be done by considering the following analysis 
e It describes the behaviour of function in the limit. 


e There are multiple asymptotic notations those compare the sizes of 
functions. 
O (Big Oh) = < {Asymptotic upper bound} 
Q (Big Omega) = = {Asymptotic lower bound} 
8 (Big Theta) =~ = {Asymptotic tight bound} 
o (Small Oh) = < {Tight upper bound} 
o (Small Omega) = > {Tight lower bound} 


Asymptotic Notations 


The notations we use to describe the asymptotic running time of an 
algorithm are defined in terms of functions whose domains are the set of 
natural numbers N = {0,1, 2...} 


The asymptotic notations consist of the following useful notations 
Big Oh (0) 
If we write f(n) = O(g(n)), then there exists a function f(n) such that 
f(n) <cg (n) 
with any constant c. 
Or we can say g (n) is an asymptotic upper bound for f(n). 
e.g., 2n? = O(n?) with constantc = 1. 


Big Omega (Q) 
If we write f(n) = Q (g(n)), then there exists a function f(n) such that 
f(n) = cg(n) 
with any constant c. 
Function g(n) is an asymptotic lower bound for f(n). 
e.g., Jn=Q (log, n) with constant c = 1. 


Big Theta (6) 
If we write f(n) =8 (g(n)), then there exists a function f(n) such that 
cg (Nn) < f(n) < cag (n) 
with any positive constants c4 and c». 
Function g (n) is an asymptotically tight bound for f(n). 
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Theorem f(n) = 8 (g(n)) if and only 
if f =O (g(n)) and f(n) =Q (g(n)). 


Small Oh (0) 


Notation If we write f(n) =o (g(n)), then there exists a function such that 
f(n) <cg(n), with any positive constant c. 


Function g(n) is an asymptotically tight upper bound of f(n). 


n2 
1.99999 =0 (nê) Py o (nê) 


e€.g., 1.n a 
logan 


Small Omega (w) 
Notation If we write f(n) = œ (g (n)), then there exists a function such that 
f(n) >cg(n), with any positive constant c. 


g(n) is asymptotically tight lower bound of f(n). 
e.g., n200001 _ o (n?) 
and n? #o(n°) 
Comparisons of a Symptotic Notations 
Based on the Relational Properties 
Assume, f(n) and g (n) are as asymptotically positive. 
The relational properties are given as under 
Transitivity 
f(n) =8 (g(n)) and g (n) =0 (A (n)) 
> f(n) =0 (h(n)) 
This property is same for O, Q,0 and œ. 
Reflexivity 
f (n) =9 (f (n) 
This property is same for O and Q. 
Symmetry 
f(n) = 8 (g(n)) if and only if g (n) =9 (f(n)) 
Transpose Symmetry 
f(n) =O (g(n)) if and only if g(n) = Q (f(n)) 
f(n) =O (g(n)) if and only if g(n) = w(f(n)) 
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Recurrences 
A recurrence is a function, defined in terms of 
e One or more base cases e Itself with smaller arguments 
1 if n=1 
T(n) = iets 
T(n-14+1ifn>1 
T(n)=n 


In algorithm analysis, we usually express both the recurrence and its 
solution using asymptotic notation. 


Methods to Solve the Recurrence Relation 
There are two methods to solve the recurrence relation given as 
1. Substitution method 2. Master method 


Substitution Method 

The substitution method is a condensed way of proving an asymptotic 
bound on a recurrence by induction. In the substitution method, instead of 
trying to find an exact closed form solution, we only try to find a closed form 
bound on the recurrence. 


There are two steps in this method 
e Guess the solution 
e Use induction to find the constants and show that the solution works. 


‘Key Points <0 
+ The substitution method is powerful approach that is able to prove upper 
bounds for almost all recurrences. 
* The substitution method requires the use of induction. 


Master Method 
The master method gives us a quick way to find solutions to recurrence 
relations of the form T(n) = aT (n/b) + f(n). 


where, a and b are constants, a > 1and b >1. Here, a represents how 
many recursive calls are made, b represents the factor by which the work is 
reduced in each recursive call. f(n) represents how much work is done by 
each call. 
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Solution of Recurrence Relation 
a The solution of the recurrence relation will be in the form of 


Tin) =n 82 [Um]; h(n) = 
n Ogp a 


= Now, the values of U(n) and h(n) are related in three cases as follows 
Case 1 Ifh(n) =n’, r < 0, then U (n) = O (1) 
Case 2 Ifh(n) =n", r > 0, then U (n) = O(n’) 


D Clog, n)'* | 
Case 3 If h(n) = (logn)',i = 0, then U (n) = rece 
nog 


T(n) = O(n log n) 


Searching 


In computer science, search operation is used for finding an item with 
specified properties among a collection of items. 


Searching can be done in two ways 


Sequential Search (Linear Search) 


This is the most common algorithm which starts at the beginning and walk 
to the end, testing for a match at each item. This searching has the benefit 
of simplicity. 

Pseudo code for sequential searching 

int linearsearch (int a[ ], int first , int last, int key) 


{ 


for (int i= first; i<=last; i++) 


{ 
if (key ==a [i]) 
1 
return i; // successfully found the 
} // key and return location 
} 
return — 1; // failed to find key element 


Handbook Computer Science & IT 81 


Flow Chart for Linear Search 


First = 0 
Last = length-1 



















no 





First = first + 1 Target found 

















Target is not found 











Analysis of Sequential Search 

The time complexity in sequential search in all three cases is given below 
Best case When we find the key on the first location of array, then the 
complexity is O (1). 

Worst case When the key is not found in the array and we have to scan the 
complete array, then the complexity is O(n). 

Average case When the key could appear anywhere in the array, then a 
successful search will take total 1+ 2 + 3 +...+ n comparisons. 


n(n+1) 


1424+34+...4n= comparisons 


; n+1 1 
<. Average number of comparisons = n ES x— 
n 





_ Total number of oe 
f l number of items 
n+ 
o2, 


So, the time complexity = O (n) 
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Binary Search 


In computer science, a binary search (half interval search) algorithm finds 
the position of a specified value with a sorted array. In each step, this 
algorithm divides the array into three sections 

e Middle element 

e Elements on left side of the middle element 

e Elements on right side of the middle element 


Then, check that if the middle element is the correct value, the searching 
stops. Otherwise go to the left side of the array if searching item is less than 
the middle element or go to the right side of the array if searching item is 
greater than middle element and this will go on until either the algorithm 
found the element or there are no elements to examine. This algorithm 
always look at the center value. Each time you get to discard half of the 
remaining list. 


Pseudo Code of Binary Search 
int binarysearch (int a[ ], int n, int key) 
{ 
int first =0, last =n-1, middle; 
while (first <= last) 


middle = (first + last)/2; /* 
calculate middle*/ 

if (a [middle]= value) /* 

if value is found at mid */ 


{ 
return middle; 
} 
else if (a [middle] > value) /* 
if value is at left half */ 
{ 
last = middle - 1; 
J 
else 


first = middle + 1; /* 
if value is in right half */ 


return - 1; 
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Flow Chart for Binary Search 


First = 0 
Last = length-1 




















mid = first + last 


a [mid] = key 
? 
no 
S n 


Key < a [mid] 
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Item is found 











Last = mid - 1 




















Item is not found 











Analysis of Binary Search 
The time complexity of binary search in all three cases is given below 


Best case The best case complexity is O (1) /.e., if the element to search is 
the middle element of the complete array. So, there is only one comparison 
required. 


Average case This search cut the range in half every time. 
T(n) =T(n/2) +1 
T(n) =T(n/27) +2 


Repeat upto K times, we get 

T(n) =T(n/2*) +K 

Let n=2%5; T(n) =T(25/25)+K 

T(n) =T(1)+K 

T(n) = 1+ K // when array has one element, then it takes O(1) time 
T(n) = O(k); 
If n=2 =K=logyn; T(n) =O (log; n) 
Worst case In worst case, the complexity of binary search is O(log, n), 


because if element is not found but we have to run the algorithm, until there 
are no element in the array. 


Sorting 


In Computer Science, sorting operation is used to put the elements of list in 
a certain order, i.e., either in non-decreasing or decreasing order. 


Sorting can be of two types 
e in-place sorting 
e Stable Sorting 


In-place Sorting 


The in-place sorting algorithm does not use extra storage to sort the 
elements of a list. An algorithm is called in-place as long as it overwrites its 
input with its output. Those algorithms, which are not in-place, are out of 
place algorithms. 


Stable Sorting 


Stable sorting algorithm maintain the relative order of records with equal 
values during sorting and after sorting operation or we can say that a stable 
sorting algorithm is called stable, if it keeps elements with equal keys in the 
same relative order as they were in the input. 


Las eaael-acp- aanas SSS 5 225620 Se 22256 M aa 25 e245 252 25165-22558 66 55 


Divide and Conquer Approach 


The divide and conquer approach is based on the recursion or it works by recursively 
breaking down a problem into two or more subproblems, until these becomes simple 
enough to be solved directly. The solutions of the subproblems are then combined to 
give a solution to the original problem. 


Methods of Sorting 


There are different sorting techniques as given below 


Bubble Sorting 

Bubble sorting is a simple sorting algorithm that works by repeatedly 
stepping through the list to be sorted, comparing each pair of adjacent 
items and swapping them, if they are in random or unorganised order. This 
sorting technique go through multiple passes over the array and each pass 
moves the largest (or smallest) element to the end of the array. 


Handbook Computer Science & IT 85 


Pseudo Code for Bubble Sorting 
void BubbleSort (int a[ ], int n) 


{ 
int i, temp, j; 
for (131; i<=n; 1+) 
{ 
for (j=1; j<=i; j+) 
{ 
if (a[j] >a [j +1]) 
{ 
temp =a [j], 
a [j] =a [j+1]; 
a [j +1] = temp; 
Í 
Í 
} 
} 


The internal loop is used to compare adjacent elements and swap 
elements, if they are not in order. After the internal loop has finished the 
execution, the largest element in the array will be moved at the top. The 
external loop is used to contro! the number of passes. 


Flow Chart for Bubble Sorting 
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Analysis of Bubble Sort 
Generally, the number of comparisons between elements in bubble sort 
can be stated as follows 


(n-1)+(n-2) +..42+1=n(n-1)/2 =O(n?) 


In any case (worst case, best case or average case) to sort the list in 
ascending order, the number of comparisons between elements would be 
same. But the time complexity is different. 


The time complexity of bubble sort in all three cases is given below 
e BestcaseO(n) œ Average caseO(n°) œ Worst case O(n’) 


Insertion Sorting 
The insertion sort only passes through the array once. Therefore, it is very 
fast and efficient sorting algorithm with small arrays. 


Key PONTS anne 


+ This algorithm works the way you might sort a hand of playing cards 

+ We start with an empty left hand [sorted array] and the cards face down on 
the table [unsorted array]. 

+ Then, remove one card [key] at a time from the table [unsorted array] and 
insert it into the correct position in the left hand [sorted array]. 

+ To find the correct position for the card, we compare it with each of the cards 
already in the hand, from right to left. 


Pseudo Code of Insertion Sort 
int insertionsort (int a[], int n) 
{Í 
int i, j, key; 
for (j =2; j<=n; j ++) 
{ . 
key =a [j]; 
i=j-1; 
while (i >0&&a [i] > key) 
{ 
a[i+1] =a [i]; 
i=i-1; 
} 
a [i +1] = key; 
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Flow Chart for Insertion Sorting 






Read n numbers 
in array a 


i=2,j=0 








i = 0 and a [i] > key 
ifs 


Analysis of Insertion Sort 


The running time depends on the input. An already sorted sequence is 
easier to sort. 


The time complexity of insestion sort in all these cases is given below 
Worst case Input is in reverse sorted order 
n 
Tn) = 2 8(/) =8 (n°) 
j2 
Average case All permutations equally likely 
n 


T(n) = 0 (j/2)=0 (n°) 


Best case If array is already sorted 
T(n) = O(n) 


Merge Sort 


Merge sort is a comparison based sorting algorithm. Merge sort is a stable 
sort, which means that the implementation preserves the input order of equal 
elements in the sorted output array. Merge sort is a divide conquer algorithm. 


The performance is independent of the initial order of the array items. 
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Pseudo Code of Merge Sort 
void mergeSort (int a[ ], int first, int last) 
{ 
if (first < last) 
í 
int mid = (first + last)/2; 
mergeSort (a, first, mid); 
mergeSort (a, mid + 1, last); 
//Divide the array into pieces 
merge (a, first, mid, last); 
//Small pieces are merged 


J 


void merge (inta [], 
int first, int mid, int last) 


{ 

int c[100]; // temporary array 
int i, j, k, m; 

i=first ; 

j= mid; 

k=mid +1; 

m= first; 
while (first <= last) 
{ 

while (i <= j & k <= last) 

{ 

if (a [i] <a [k]) 
{ 
c[m] =a [i]; 
i++; 
m++; 
} 
else 
{ 
c[m] =a [k], 
m++; 
k ++; 
} 
J 


while (i<=j) 


{ 
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c [m] =a [i]; 


i++; 
m ++; 
} 
while (k <= last) 
{ 
c [m] =a [k]; 
m ++; 
k +; 
} 
for (i1=0; i<= last; i++) 
{ 
a [i] =c [i]; 
} 


} 


- Key Pitt Sanna 


There are three main steps in merge sort algorithm 
+ Divide An array into halves, until only one piece left. 
+ Sort Each half portion. 
+ Merge The sorted halves into one sorted array. 


Analysis of Merge Sort 
The merge sort algorithm always divides the array into two balanced lists. 
So, the recurrence relation for merge sort is 

1 if = 

Tin) = ae 

2T (n/2) + 4n otherwise 

Using master method, 
T(n) = 2T(n/2) + 4n 
a=2,b0=2,f(n) =4n 





f(n) — 4n An. 0 
h(n) Np? H9922 pn 4 (loga n) 
1 
So, U(n) = ogan) =O (logn) //4is constant. So, ignore this 


P 
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T(n) = 11°94 .U(n) = 11°92? . O(log; n) 


=O (nlogsn) 
The time complexity of merge sort in all three case (best, average and 
worst) is given below 
e Best case O(nlog,n) 
e Average case O (nlog,n) 
e Worst case O (nlog,n) 


Quick Sort 
It is in-place sorting, since it uses only a small auxiliary stack. It is also 
known as partition exchange sort. Quick sort first divides a large list into 
smaller sublists. Quick sort can then recursively sort the sublists. 

a 





< pivot > pivot 

















low pivot high 
Partition data 


-Key POINTS -iiini 


+ Itis a divide and conquer algorithm. 

+ In quick sort algorithm, pivot element is selected and arrange all the items in 
the lower part are less than the pivot and all those in the upper part greater 
than it. 


The pseudo code of quick sort 
int quicksort (inta [ ], int low, int high) 


{ 
int pivot; 
if (low < high) 
{ 
pivot = partition (a, low, high); 
quicksort (a, low, pivot - 1); 
quicksort (a, pivot +1, high); 
} 
} 
int partition (int a[ ], int low, int high) 
{ 


int temp; 
int i = low; 
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int p = low; 
int q= last; 
while (i<j) 


while (a [p] <=a [i] &&i< last) 
{ 


itt; 


} 
while (a[q] >a [i]) 


j=; 
} 
if (p<q) 
{ 
temp =a [p]; 
a [p]= a [q]; 
a [q]= temp; 


temp =a [i]; 
a [i] =a [q]; 
a [q] = temp; 
return q; 


} 


Analysis of Quick Sort 
The time complexity calculation of quick sort in all three cases is given 
below 


Worst case O(n?) This happens when the pivot is the smallest 


(or the largest) element. Then, one of the partitions is empty and we repeat 
recursively the procedure of n — 1elements. 


The recurrence relation for worst case 
T(n) =T(n-1)+en,n>1 
T(n) =T(n-2)+c(n-1)+cn 
T(n) =T(n - 3) +c(n-2)+c(n-1)+cn 
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Repeat upto K — 1times, we get 
T(n) =T(1))+c(2 + 3+4+...+n) 


=140( 204) 7 
2 


T(n) = O(n*) 


Best case O(n log» n) The pivot is in the middle and the subarrays divide 
into balanced partition every time. So, the recurrence relation for this 
situation 





T(n) =2T (n/2) +cn 
Solve this, using master method 
a=2,b=2,f(n)=cn 
cn cn 
h(n) z n928 = n0922 E 
U(n) =O (logs n) 
Then, T(n) = n°? U(n) 





c (logy n)? 


=n: O(logə n) 
T(n) =O (nlogən) 


Average case O(nlog,n) Quick sort’s average running time is much 
closer to the best case than to the worst case. 








Size of Left Partition Size of Right Partition 
1 n=] 
2 n-2 
3 n-3 
A= 
n-2 2 
n-1 1 





The average value of 7(/) is 7 times, the sum of 7(0) through T(n — 1) 


= Ts t(j), j =0ton—1 
n 
= T(n) =E (ETG) +on 


(Here we multiply with 2 because there are two partitions) 
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On multiply by n, we get 
nT(n)=2(xT(j))+cen* n 
To remove the summation, we rewrite the equation for n — 1. 
(n- 1) T(n -1)=2 (ZT(j)) +c(n -= 1)°, | = O ton — 1and subtract 
nT(n) -—(n-1) T(n- 1) =2T (n-1)+2cn-c 
Rearrange the terms, drop the insignificant c 
> nT(n) =(n + 1) T(n — 1) + 2cn 
On dividing by n (n + 1), we get 
T(n) _T(n-1) $ 2c 











n+ n n+1 
= Mn-1) _Tn=2) 2c 
n (n-1) n 
= T(n-2) _T(n-3) | 2c 
n-1 (n-2) n-i 
= Fe) _ TM) 2c 
3 2 3 
Add the equations and cross equal terms 
EO TMM ope {ih =3ton+1 
n+1 2 j 
T(n) = (n + 1) (1/2 + 2c X(1//)) 
The sum x(1//), j = 3to (n - 1) is about log, n 


T(n) = (n + 1) (loge n) 
T(n) =O (nlogsn) 
Advantages of Quick Sort Method 
e One of the fastest algorithms on average. 
e It does not need additional memory, so this is called in-place sorting. 


Heap Sort 


Heap sort is simple to implement and is a comparison based sorting. It is 
in-place sorting but not a stable sort. 


Note Heap sorting uses a heap data structure. 
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Heap Data Structure Heap A is a nearly complete binary tree. 


Height of node = Height of edges on a longest simple path from the node 
down to a leaf 


Height of heap = Height of root (25) 
=6 (log n) 
e.g., We represent heaps in level order, going (13) (17) 
from left to right. The array corresponding to the 
heap at the right side is [25, 13, 17, 5, 8, 3]. (5) (8) G) 


The root of the tree A[1] and given index i of a node, Heap sort diagram 
the indices of its parent, left child and right child can 
be computed. 
Parent (i) 
return floor (i/2) 
Left (i) 
return 2i 
Right (i) 
return 2i +1 


Heap Property 
The heap properties based on the comparison of value of node and its 
parent are as given below 
Max heap In a heap, for every node i, other than the root, the value of a 
node is less than or equal (atmost) to the value of its parent. 

A [parent (i)] =A [i] 
Min heap In a heap, for every node i, other than the root, the value of a 
node is greater than or equal to the value of its parent. 

A [parent (i)] < A[i] 


Maintaining the Heap Property 
Max heapify is important for manipulating max heap. It is used to maintain 
the max heap property. 
The following conditions will be considered for maintaining the heap 
property are given as 
e Before max heapify, A[i] may be smaller than its children. 
e Assume left and right subtree of i are max heaps. 
e After max heapify, subtree rooted at i is a max heap. 
Pseudo Code for Max Heapify 
MaxHeapify (inta[ ], i, n) 
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{ 
int largest, temp; 
int 1=2*i; 
int r=2*i+1,; 
if (l<né&&a [1] >a [i]) 
£ 
largest = 1; 
} 
else 
largest = i; 
if (r <n&&a [r] >a [largest] ) 
í 
largest =r; 
} 
if (largest! =i) 
{ 
temp =a [largest]; 
a [largest] =a [i]; 
a [i] = temp; 
} 
MaxHeapify (a, largest, n); 
Í 


Basically, three basic operations performed on heap are given as 
e Heapify, which runs in O (log, n) time. 

e Build heap, which runs in linear time O(1). 

e Heap sort, which runs in O(n log, n) time. 


We have already discussed the first operation of heap sort. Next we will 
discuss how to build a heap. 


Building a Heap 
The following procedure given an unordered array, will produce a max heap 
BuildMaxHeap (int a [ ], int n) 


{ 
int i; 
for (i= (n/2); i>1; i- -) 


Maxheapify (a, i, n); 


} 
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Heap Sort Algorithm 
The heap sort algorithm combines the best of both merge and insertion 
sort. Like merge sort, the worst case time of heap sort is O(n log, n) and 
like insertion sort, heap sort is in-place sort. 
Pseudo code for heap sort procedure 
Heapsort (int a[ ], int n) 
{ 

int i, temp; 

BuildMaxHeap (a, n); 

for (i=n; i>=2; i- -) 


temp = a[1]; 
a[1] =a [i]; 
a [i] = temp; 
heapsize [a] = heapsize [a] - 1; 


Heapify (a, 1); 
} 


Analysis of Heap Sort 
The total time for heap sort is O (n log n) in all three cases 
(i.e., best case, average case and worst case) 


Hashing 


Hashing is a common method of accessing data records. Hashing is the 
process of mapping large amount of data item to a smaller table with the 
help of hashing function. 


Linear Search versus Hashing 
= A linear search looks down a list, one item at a time, without jumping. In 
complexity terms, this is an O(n) search, the time taken to search the list. 


= If we use hashing to search, data need not be sorted. Without collision and 
overflow, search only takes O(1) time. 
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Hash Table 


A hash system that stores records in an array, called a hash table. A hash 
table uses a hash function to compute an index into an array of buckets, 
from which the correct value can be found. 


Hash Function 


e Hash function primarily is responsible to map between the original data 
items and the smaller table or in other words, we can say that the fixed 
process, to convert a key into a hash key is known as a hash function. 

e A hash function is used whenever access to the table is needed. 

Hash function 
Key ——————-> Index to array 

e Agood hash function satisfies (approx) the assumption of simple uniform 
hashing. Each key is equally likely to hash to any of the m slots, 
independently of where any other key has hashed to. 


There are many hash functions approaches as follows 


Division Method 
Mapping a key K into one of m slots by taking the remainder of K divided by m. 
h (K) =K mod m 
fe) 


e.g., Let’s say, there are three numbers or keys that 
we want to map into the hash table of 10 slots 


123456, 123467, 123450 
key 1 key 2 key 3 
e 123456% 10 >6 


The remainder is 6, so the key would be placed at 6th 
index. 


e 123467% 10 >7 

The remainder is 7, so the key would be placed at 7th index. 
e 123450% 10 >0 

The remainder is 0, so the key would be placed at Oth index. 
Mid-Square Method 


Mapping a key K into one of m slots, by getting the some middle digits 
from value K°. 


fs | 
123456 





h(k) = K? and get middle (log, M) digits 
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Folding Method 


Divide the key K into some sections, besides the last section, have same 
length. Then, add these sections together. 


e Shift folding 
e Folding at the boundaries 


h(K) = (section divided from K) by a or b 





















































e.g., Key 123 20 03 241112 20, section length = 3 
123 203 241 112 

Sections S, S» S3 
(a) If we use shift folding method, then add all the sections 

Si 123 

S 203 

S3 241 

Sy 112 

S; 20 

879 


(b) If we use folding at the boundaries method, then 





Sı 
S2 
S3 
S4 
S 




















123 
302 
241 
211 
20 





879 








20 








Collision 


No matter what the hash function, there is the possibility that two different 
keys could resolve to the same hash address. This situation is known as a 
collision. 


When this situation occur, there are two simple methods to this problem 
solve. 


Chaining 

A chain is simply a linked list of all the elements with the same hash key. 
The hash table slots will no longer hold a table element (key). They will now 
hold the address of a table element 


h(K) = key% table slots 
e.g., The following keys to be hashed 
36, 18, 72, 43,6, 10, 5, 15 


There are 8 slots in the hash table. So, first of all we have to calculate the 
hash index of each key. 





36 % 8=4 0 72 
18% 8=2 1 
72% 8=0 2 [18] +o 
43% 8=3 3 [43 D< 
ə 4 [36 P< 
6%8=6 
5 [5 D< 
10% 8=2 L 
A 6 e < 
5%8=5 i ES 
15%8=7 


Hash table for the keys 


Hashing with Linear Probing 


When using a linear probing method the item will be stored in the next 
available slot in the table, assuming that the table is not already full. This is 
implemented via a linear searching for an empty slot, from the point of 
collision. If the end of table is reached during the linear search, the search 
will wrap around to the beginning of the table and continue from there. 


-Key POINTS iiinn 
+ Ifan empty slot is not found before reaching the point of collision, the table is 
full. 
+ In chaining, we put all the elements, which to the same slot, in a linked list. 
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e.g., Suppose the keys to be inserted or hashed are as follows 
36, 18, 72, 43, 6, 10, 5, 15 

and there are 8 slots in hash table. 

Now, we have to find out the hash address of keys. 
































36% 8= 4 < inserted of 72 
18%8= 2 «inserted 1| 15 
72%8= 0 «inserted 2| 18 
43%8= 3 < inserted r = 
6%8= 6 < inserted 5| 10 
10%8= 2 < problem is here because 2nd place is 6| 6 
already full 7| 5 

5%8= 5 & Hash table for 

the keys 


15%8= 7 & 
So, when the hash index, what we get through the hash function is 
already full, then search the next empty slot. So, for key 10, 2nd index is 
already full, then search the 3rd, it is also full, then search the 4th, it is 
also full, the search 5th index it is empty, then fill with key 10. 


Tree 
A tree is the data structure that are based on © 
hierarchical tree structure with set of nodes. It is a Oo D D 
acyclic connected graph with one or more children 
nodes and at most one root node. © Ò © 
In a tree, there are two types of node 
e Internal nodes All nodes those have children A © 


nodes are called as internal nodes. 
e Leaf nodes Those nodes, which have no child, are 
called leaf nodes. 


Example of tree 


Tree terminology 

The depth of a node is the number of edges from the root to the node. 

The height of a node is the number of edges from the node to the deepest 
leaf. The height of a tree is the height of the root. 


Binary Tree 

A binary tree is a tree like structure that is rooted and in which each node 
has at most two children and each child of a node is designated as its left 
or right child. 


Handbook Computer Science & IT 101 


height 0 





height 3 


A simple binary tree 


Types of Binary Trees 


The types of binary trees depending on the trees structure are as given 
below 


Full Binary Tree 
A full binary tree is a tree in which every node other than the leaves has two 
children or every node in a full binary tree has exactly 0 or 2 children. 


@) 
@) © 


00 © ® 
@ @3) 


Full binary tree 


Complete Binary Tree 
A complete binary tree is a tree in which every level, except possibly the 
last, is completely filled. 


2) 
@) © 


00 000 
0 


Complete binary tree 
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The number n of nodes in a binary tree of height h is 6) 
atleast n= h + 1and atmost 
n=2"+ 1—4, where his the depth of the tree T) (6) 
Full and complete 
binary tree 


Tree Traversal 


There are three standard ways of traversing a binary tree T with root R. 
These three ways are given below 


Preorder 
e Process the root R. 
e Traverse the left subtree of R in preorder. 
e Traverse the right subtree of R in preorder. 


Inorder 
e Traverse the left subtree of R in inorder. 
e Process the root R. 
e Traverse the right subtree of R in inorder. 


Postorder 
e Traverse the left subtree of R in postorder. 
e Traverse the right subtree of R in postorder. 
e Process the root R. 


Breadth First Traversal (BFT) 


The breadth first traversal of a tree visits the nodes in the order of their 
depth in the tree. Breadth first traversal first visits all the nodes at depth 
zero (i.e., root), then all the nodes at depth one and so on. 


Depth First Traversal (DFT) 


Depth first traversal is an algorithm for traversing or searching a tree, tree 
structure or graph. One starts at the root and explores as far as possible 
along each branch before backtracking. 


Rey Pitt Sanaa naa 


+ Queue data structure is used for this traversal. 
+ Stack data structure is used for traversal 
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Binary Search Tree 


A binary search tree is a binary tree, which may also AA 
be called an ordered or sorted binary tree. 


A binary search tree has the following properties g E 
e The left subtree of a node contains only nodes with 
keys less than the node’s key. © © O 


e The right subtree of a node contains only nodes Binary search tree 
with keys greater than the node’s key. 


e Both the left subtrees and right subtrees must also be binary search trees. 


Pseudo Code for Binary Search Tree (BST) 
The pseudo code for BST is same as binary tree because it also has node 
with data, pointer to left child and a pointer to right child. 
struct node 
{ 
int data; 
struct node * left; 
struct node * right; 


// A function that allocates a new node with the given 
data and NULL // left and right pointers 
struct node* newNode (int data) 
} 
struct node* newNode = (struct node*) malloc (sizeof 
(struct node) ); 


newNode — data = data; 

newNode — left = NULL; 
newNode > right = NULL; 
return (newNode) ; 


} 
Insert a Node into BST 


/* Given a binary search tree and a number, inserts a new node with the 
given number in the correct place in the tree*/ 


struct node * insert (struct node* node, int data) 


{ 
/* If the tree is empty, return anew, single node */ 
if (node ==NULL) 
return (newNode (data) ); 
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else 
af 
/* Otherwise, insert a newnode at the 
appropriate place, maintaining the property 
of BST */ 
if (data < node —> data) 
node > left = insert (node > left, data); 
else 
node > right = insert (node right, data); 
return node; 
} 


} 


In a BST, insertion is always at the leaf level, traverse the BST, comparing 
the new value to existing ones, until you find the right point, then add a new 
leaf node holding that value. 


Deletion From a Binary Search Tree 


Deletion is the most complex operation on a BST, because the algorithm 
must result in a BST. The question is “what value should replace the one 
deleted?” 


Basically, to remove an element, we have to follow two steps 
e Search for a node to remove. 
e lf the node is found, run remove algorithm. 


Procedure for Deletion of a Node From BST 
There are three cases to consider 


Deleting a Leaf Node (Node with No Children) 
Deleting a leaf node is easy, as we can simply remove it from the tree. 


Remove 4 from a BST given below 


0 © 


Simply remove this 
node from the tree 
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Deleting a Node with One Child 
Remove the node and replace it with its child. 


Remove 18 from a BST given below 


(5) 





e Inthis case, node is cut from the tree and node’s subtree directly links to 
the parent of the removed node 


Deleting a Node with Two Children 


This is the most complex case to solve it, we use the approach as given 
below 


e Find a minimum value in the right subtree. 


e Replace value of the node to be removed with found minimum. Now, right 
subtree contains a duplicate. 


e Apply remove to the right subtree to remove a duplicate. 
e.g., Remove 12 from the given BST 


Remove and 
replace 
with 19 


It has 
duplicate 
values 














Right subtree of that Remove 
node which is to be this node 
removed and 19 is 

the minimum value of 

the right subtree 
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Now, if we delete 2 from this BST, then 





Duplicate 
value should 
be removed 










In the right subtree 3 
is minimum, so 2 should 
be replaced with the value 3 


Implementation of BST Deletion 


void deleteNode (striuct node * treePtr, int value) 


{ 


struct node rightPtr, leftPtr; 
struct node current = * treePtr, * parentPtr, * temp; 
if (! current) 


{ 


printf (“The tree is empty or has not this node”); 
return; 


if (current — data == value) 


/* It’s a leaf node */ 
if ( ! current > rightPtr && ! current > leftPtr) 


{ 
* treePtr = NULL; 


free (current); 
+ 

/* the right child is NULL and left child 
is not null t */ 


else if (! current > rightPtr) 


* treePtr = current — leftPtr; 
free (current); 


{ 


Handbook Computer Science & IT 107 


/* the node has both children */ 
else 
} 
temp = current > rightPtr; 
/* the rightPtr with left child */ 
if (! temp >leftPtr) 
temp > leftPtr = current > leftPtr; 
/* the rightptr have left child */ 


else 
{ 
/* find the smallest node in right subtree */ 
while (temp > leftPtr) 
í 
/* record the parent node of temp */ 
parentPtr = temp; 
temp = temp > leftPtr; 
} 
parentPtr > leftPtr = temp > rightPtr; 
temp > leftPtr = current > leftPtr; 
temp > rightPtr = current > rightPtr; 
} 
* treePtr = temp; 
tree (current); 
} 
} 
else if (value > current — data) 
{ 
/* search the node */ 
deleteNode (& (current — rightPtr), value); 
J 
else if (value < current — data) 
{ 
deleteNode (& (current — leftPtr), 
value); 
} 
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Average Case Performance of Binary 

Search Tree Operations 

Internal Path Length (IPL) of a binary tree is the sum of the depths of its 
nodes. So, the average internal path length T(n) of binary search trees with 
n nodes is O(n log n). 


The average complexity to find or insert operations is T(n) = O (log n). 


Improving the Efficiency of Binary Search Trees 
e Keeping the tree balanced. 
e Reducing the time for key comparison. 


Balanced Trees 
Balancing ensures that the internal path lengths are close to the optimal 
n logn. A balanced tree will have the lowest possible overall height. 


There are many balanced trees 
1. AVL Trees 2. B-Trees 


AVL Trees 


An AVL (Adelson-Velskii and Land is) is a binary tree with the following 

additional balance property. 

e For any node in the tree, the height of the left and right subtrees can differ 
by atmost. 

e The height of an empty subtree is —1. 

The method to keep the tree balanced is called node rotation. AVL trees 

are binary trees those use node rotation upon inserting a new node. 

An AVL is a binary search tree which has the following properties 

e The subtrees of every node differ in height by atmost one. 

e Every subtree is an AVL tree. 

Every node of an AVL tree is associated with a balance factor. 

Balance factor of a node = Height of left subtree — Height of right subtree 

A node with balance factor — 1, 0 or 1 is considered as balanced. 


An AVL tree is shown here 


This AVL tree is balanced and a BST. 
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It is a BST but not AVL tree because it is not balanced. At 2 40 
node 10, the tree is unbalanced. So, we have to rotate this 
tree. 1 (5) 
r 0@) 
Rotations AVL togs 


A tree rotation is required when we have inserted or deleted a 
node which leaves the tree in an unbalanced form. 


There are four types of rotations as given below 
Left rotation (L-rotation) Suppose we have AVL tree 


2 
® a JX 
(B) ae Cc L Euan Q © 


is inserted 
as follows 


Left rotation of nodes 
Right rotation (R-rotation) Suppose, we have an AVL tree 
2 ©) ®) 
© => 18) => 
(B) node A N R rotation Q © 


is inserted 0 (A) 
as follows 


Right rotation of nodes 


Double left-right rotation (L-R rotation) Suppose, we have an AVL tree 


© E A rotation Po? 
=>  Lrotati 
node B oo oN T, R ae @ © 


is inserted 


Double left- ine rotation of nodes 


Double right-left rotation (R-L rotation) Suppose, we have an AVL tree. 


® (B) 
Sats oy ane © 
© EB m Rtn 


Double A left rotation of nodes 
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Greedy Algorithms 


Greedy algorithms are simple and straight forward. They are short sighted 
in their approach in the sense that they take decisions on the basis of 
information at hand without worrying about the effect of this decisions in the 
future. They are as follows 

e Easy to implement. 

e Easy to invent and require minimal amount of resources. 

e Quite efficient. 


Note Greedy approaches are used to solve optimization problems. 


Feasibility 

A feasible set is promising, if it can be extended to produce not merely as 
solution, but an optimal solution to the problem. Unlike dynamic 
programming which solves the subproblems bottom-up strategy, a Greedy 
strategy usually progress in a top-down fashion, making one Greedy 
choice after another, reducing each problem to a smaller one. 


Optimization Problems 


An optimization problem is one in which you want to find not just a solution, 
but the best solution. A Greedy algorithm sometimes works well for 
optimization problems. 


Examples of Greedy Algorithms 


The examples based on Greedy algorithms are given below 


Activity Selection Problem 

If there are n activities, S = {1,2,...,n}, where S is the set of activities. 
Activities are to be scheduled to use some resources, where each activity / 
has its starting time S; and finish time f with S; < f, i.e., the two activities / 
and j are non-interfering, if their start-finish intervals do not overlap. So, 
(Sj, ff) o (Sf) =o 

The activities / and j are compatible, if their time periods are disjoint. In this 
algorithm, the activities are firstly sorted in an increasing order, according 
to their finish time. With the Greedy strategy, we first select the activity with 
smallest interval (F; — S;) and schedule it, then skip all activities, that are not 
compatible with the current selected activity and repeats this process until 
all the activities are considered. 
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Algorithm for Activity Selection Problem 
Activity_Selection_Problem (S,f,i,j) 


{ 


} 


Aci 

mei+1 

while m<jands, < fi 

dome m+1 

ifm<j 

then return A=AU Activity_Selection Problem (S, f, m, j) 
else 

return ọ 


The running time of this algorithm is 6 (n) assuming that the 
activities are already sorted. 


Fractional Knapsack Problem 


There are n items; th item is worth v; dollars and weights w; pounds, where 
v; and w,, are integers. Select items to put in Knapsack with total weight 
<W, so that total value is maximised. In fractional Knapsack problem, 
fractions of items are allowed. 


+ 


Key Points = eee 
To solve the fractional problem, we have to compute the value per weight for 
each item. 

According to Greedy approach, item with the largest value per weight will be 
taken first. 


This process will go on until we can’t carry any more. 


Algorithm of Fractional Knapsack 
Greedy_Fractional_Knapsack (G, n, W) 


{ 


/* G is already sorted by the value per weight of items 
order */ 


R =WẹĆi=1 
while ( (R, > ©)and(i < n)) do 
if (R, 2 w;)thenf; = 1 
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i++ 
end while 
returnF 


} 


where F is the filling pattern i.e., F ={ f,f,...f,} and O< f <1denotes 
the fraction of the /th item. 


Analysis 


If the items are already sorted into the decreasing order of value per 
weight, then the algorithm takes O(n) times. 


Task Scheduling Problem 


There is a set of n jobs, S = { 1,2,... n}, each job į has a deadline d; = 0 and 
a profit p; = 0. We need one unit of time to process each job. We can earn 
the profit p;, if job / is completed by its deadline. To solve the problem of 
task scheduling sort the profit of jobs (p;) into non-decreasing order. 

After sorting, we will take the array and maximum deadline will be the size 
of array. Add the job / into the array, if job į can be completed by its 
deadline. Assign / to the array slot of [r — 1,r], where r is the largest such 
that 1<r<d;. This process will be repeated, until all jobs are examined. 


Note The time complexity of task scheduling algorithm is O(n’), 


Minimum Spanning Tree Problem 

Spanning Tree A spanning tree of a graph is any tree that includes every 
vertex in the graph. More formally, a spanning tree of a graph G is a 
subgraph of G that is a tree and contains all the vertices of G. Any two 
vertices in a spanning tree are connected by a unique path. The number of 
spanning trees in the complete graph K, is n” ~?. 

Minimum Spanning Tree A Minimum Spanning Tree (MST) of a weighted 
graph G is a spanning tree of G whose edges sum is minimum weight. 


There are two algorithms to find the minimum spanning tree of an undirected 
weighted graph. 


Kruskal’s Algorithm 


Kruskal’s algorithm is a Greedy algorithm. In this algorithm, starts with each 
vertex being its component. Repeatedly merges two components into one 
by choosing the light edge that connects them, that’s why, this algorithm is 
edge based algorithm. According to this algorithm 
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Let G = (V, E) is a connected, undirected, weighted graph. 

Scans the set of edges in increasing order by weight. The edge is selected 
such that 

e Acyclicity should be maintained. 

e It should be minimum weight. 

e When tree T contains n — 1edges, also must terminate. 

Uses a disjoint set of data structure to determine whether an edge 
connects vertices in different components. 

Data structures Before formalizing the above idea, we must review the 
disjoint set of data structure. 


MAKE_SET(v) Create a new set whose only member is pointed to by v. 
Note that for this operation, v must already be in a set. 


FIND_SET(v) Returns a pointer to the set containing v. 


UNION (u, v) Unites the dynamic sets that contain u and v into a new set 
that is union of these two sets. 


Algorithm for Kruskal 


Start with an empty set A and select at every stage the shortest edge that 
has not been chosen or rejected, regardless of where this edge is situated 
in the graph. 
KRUSKAL (V, E, w) 
{ 
Aco 
for each vertexv e V 
do MAKE_SET (v) 
Sort E into non-decreasing order by weight w 
for each (u, v) taken from the sorted list 
do if FIND_SET (u) # FIND_SET (v) 
then AAU {tu, v)} 
UNION (u, v) 
return A 


} 


Analysis of Kruskal’s Algorithm 
The total time taken by this algorithm to find the minimum spanning tree is 
O(E log, E) (if edges are already sorted). 


But the time complexity, if edges are not sorted is O(E log, V). 
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Prim’s Algorithm 

Like Kruskal’s algorithm, Prim’s algorithm is based on a generic minimum 
spanning tree algorithm. The idea of Prim’s algorithm is to find the shortest 
path in a given graph. The Prim’s algorithm has the property that the edges 
in the set A always form a single connected tree. 


In this, we begin with some vertex v in a given graph G = (V, E), defining the 
initial set of vertices A. Then, in each iteration, we choose a minimum 
weight edge (u, v), connecting a vertex v in the set A to the verte xu outside 
of set A. Then, vertex u is brought into A. This tree is repeated, until a 
spanning tree is formed. 


This algorithm uses priority queue Q. 
e Each object is a vertex in V — V4. 
e Key of v is minimum weight of any edge (u,v), where u E€ V4. 


e Then, the vertex returned by EXTRACT MIN is v such that there exists 
u € Va and (u, v) is a light edge crossing (V4, V — V4). 


Algorithm for Prim 
PRIM (V,E,w, r) 
{ 
Qo 
for eachu e V 
do key [u] < œ 
T [u] + NIL 
INSERT (Q, u) 
DECREASE_KEY (Q, r, 0) 
whileQ# o 
do u < EXTRACT_MIN (Q) 
for eachv e Adj [u] 
do ifv e Qandw (u, v) < key [v] 
then II [v] <u 
DECREASE_KEY (Q, v, w (u, v)) 
} 


Analysis of Prim’s Algorithm 
The time complexity for Prim’s algorithm is O (E log, V). 
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Dynamic Programming Algorithms 


Dynamic programming approach for the algorithm design solves the 
problems by combining the solutions to subproblems, as we do in divide 
and conquer approach. As compared to divide and conquer, dynamic 
programming is more powerful. 


Dynamic programming is a stage-wise search method suitable for 
optimization problems whose solutions may be viewed as the result of a 
sequence of decisions. 


Greedy versus Dynamic Programming 


Į 
i 
1 
| = For many optimization problems, a solution using dynamic programming can be 
| unnecessarily costly. The Greedy algorithm is simple in which each step chooses 
| the locally best. The drawback of Greedy method is that the computed global 
| solution may not always be optimal. 

; = Both techniques are optimization techniques and both build solution from a 
| collection of choices of individual elements. Dynamic programming is a powerful 
' technique but it often leads to algorithms with higher running times. 

| = Greedy method typically leads to simpler and faster algorithms. 

| = The Greedy method computes its solution by making its choices in a serial forward, 
| never looking back or revising previous choices. 

i = Dynamic programming computes its solution bottomup by synthesising them from 
smaller subsolutions and by trying many possibilities and choices before it arrives 
i the optimal set of choices. 


Approaches of Dynamic Programming 
The dynamic programming consists of following sections 


0-1 Knapsack 
The idea is same as fractional Knapsack but the items may not be broken 
into smaller pieces, so we may decide either to take an item or to leave it, 
but we may not take a fraction of item. Only dynamic programming 
algorithm exists. 


Single Source Shortest Paths 

In general, the shortest path problem is to determine one or more shortest 
path between a source vertex and a target vertex, where a set of edges are 
given. We are given a directed graph G = (V, E) with non-negative edge 
weight and a distinguished source vertex, seV. The problem is to 
determine the distance from the source vertex to every other vertex in the 


graph. 
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Dijkstra’s Algorithm 


Dijksta’s algorithm solves the single source shortest path problem on a 
weighted directed graph. It is Greedy algorithm. Dijkstra’s algorithm starts 
at the source vertex, it grows a tree 7, that spans all vertices reachable 
from S. 


Algorithm for Dijkstra 

DIJKSTRA (V, E, W, S) 
{ 

INITIALIZE _SINGLE_SOURCE (V, S) 
Sep 
QV # insert all vertices into Q 
whileQ # ọ 
dou <— EXTRACT_MIN (Q) 
S&S u {u} 
for each vertex v e€ Adj [u] 
do RELAX (u, v, w) 


INITIALIZE SINGLE _ SOURCE (G, S) 


for each vertexv e V[G] 
do d [v] + œ 

I [v] NIL 

d [S] 0 


RELAX (u, v, w) 


if d[v] >d[u]+ w (u,v) 
then d[v]<d[u]+ w (u, v) 
} 6 Tifv]<u 
have two sets of vertices 
S =Vertices whose final shortest path weights are determined 
Q =Priority queue =V- S. 
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Management System 


Database Management System 


A database management system is a collection of interrelated data and a 
set of programs to access those data. It manages new large amount of 
data and supports efficient access to new large amounts of data. 


Data 

Known facts that can be stored or recorded is called data. 

It can be of two types 

Persistent Data It continues to exist even, when the system is not active. 


Transient Data It created while an application is running and not needed, 
when the application has terminated. It must be stored in secondary 
memory. 


Database 

Database is a collection of related data with 

e A logically coherent structure (can be characterised as a whole). 

e Some inherent meaning (represents some partial view of a portion of the 
real world). 

e Some specific purpose and for an intended group of users and 
applications. 

e A largely varying size (from a personal phone book directory to the phone 
book directory of a city or state). 
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e Ascope or content of varying breadth (from a personal list of addresses to 
a multimedia encyclopedia). 

e A physical organisation of varying complexity (from a manual personal 
list, managed with simple files, to huge multiuser databases with 
geographically distributed data and access). 


Formal Definition of Data 


Logically coordinated objectives, data is defined once for a community of users and 
accessed by various applications with specific needs. 


Data type 
Instances 
pai Occurrences 
[=] Intention Data 


Extension 
Catalog x l 


Directory 
Data dictionary 


Population 





f Metadata 





The information contained in a database is represented on two levels 
(i) Data (which is large and is being frequently modified) 
(ii) Structure of data (which is small and stable in time) 


Basic Concept of DBMS 


It is a collection of general purpose, application independent programs 

providing services to 

e Define the structure of a database, i.e., data types and constraints that the 
data will have to satisfy. 

e Manage the storage of data, safely for long periods of time, on some 
storage medium controlled by the DBMS. 

e Manipulate a database, with efficient user interfaces to query the 
database to retrieve specific data, update the database to reflect 
changes in the world, generate reports from the data. 

e Manage database usage for users with their access rights, performance 
optimisation, sharing of data among several users, security from 
accidents or unauthorised use. 

e Monitor and analyse database usage. 

Database Management System (DBMS) provides efficient, reliable, 

convenient and safe multiuser storage of and access to massive amounts 

of persistent data. 
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Users/Programmers 








Database 
system Application programs/queries 


DBMS__-——__—_——_,, 


Software to process 
queries/programs 


j 


Software to access 
stored data 


f 
Metadata 


Architecture of database and DBMS software 









































Key People Involved in a DBMS 


The followings are the key people involved in a DBMS as 


DBMS Implementer 
Person who builds system. 


Database Designer 

Person responsible for preparing external schemas for applications 
„identifying and integrating user needs into a conceptual (or community or 
enterprise) schema. 


Database Application Developer 
Person responsible for implementing database application programs that 
facilitate data access for end users. 


Database Administrator 

Person responsible for define the internal schema, sub-schemas (with 
database designers) and specifying mappings between schemas, 
monitoring database usage and supervising DBMS functionality (e.g., 
access control, performance optimisation, backup and recovery policies, 
conflict management). 


End Users 
There are two categories of end users 


e Casual users Occasional unanticipated access to database. 
e.g., tourists, managers. 
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e Parametric users Users who query and update the database through 
fixed programs (invoked by non-programmer users) e.g., banking. 


Important Functions on a Database 

= Structure Definition Declare files or relations (i.e.,tables + data types). 
e.g., Employee (Emp_Name, Emp_No, Salary, Dept_ID) 

= Dept (Dept_Name,Dept_ID) 

= Population Input data about specific employee, department. 

= Querying List name of employees who are getting salary more than 3 lakh in 
department marketing. 

= Reporting We can prepare a report having employee names with their department 
name using join on employee table and dept table. 

= Modification and Update of Population We can create a new department name for 
the table department and also we can update salary of a particular employee. 

= Modification of Structure and Schema We can create a new relation (i.e.,table) for 
“(Emp_No,Skills).’ We can add address attribute to relation employee. 


Instances and Schemas and Data Model 


Data Model It provides mechanisms (languages) for defining data 
structures and operations for retrieval and modification of data. 


Schema/Intension The overall design of the database or description of 
data in terms of a data model. Schemas do not change frequently. In 
addition to data structures, the schema also comprises the definition of 
domains for data elements (attributes) and the specification of constraints, 
to define the data structure part of the schema. While intension is a 
constant value that gives the name, structure of table and the constraints 
laid on it. 


Instance/Extension 

Databases change over time because records are inserted and deleted 
frequently. The collection of information stored in the database at a 
particular moment is called an instance of the database. While extension 
means the number of tuples present in a table at any instance. This is 
dependent on time. 


Data Abstraction 


Hiding the complexity from users through several levels of abstraction, to 
simplify users’ interactions with the system, as many database systems 
users are not computer trained. 
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Levels of Data Abstraction 
There are three levels of data abstraction as given below 


Physical Level 

It is lowest level of abstraction and describes how the data are actually 
stored and complex low level data structures in detail. At the physical level, 
an employee record can be described as a block of consecutive storage 
locations. e.g., words or bytes. 


Logical Level 

It is the next higher level of abstraction and describes what data are stored 
and what relationships exist among those data. At the logical level, each 
such record is described by a type definition and the interrelationship of 
these record types is defined as well. Database administrators usually work 
at this level of abstraction. 


View Level 
It is the highest level of abstraction and describes only part of the entire 
database and hides the details of the logical level. 


e At the view level, several views of the database are defined and database 
users see these views. 


e The views also provide a security mechanism to prevent users from 
accessing certain parts of the database. e.g., tellers in a bank can see 
information on customer accounts but they cannot access information 
about salaries of employees. 


Key PONTS n-ne 


+ Even though the logical level uses simpler structures, complexity remains 
because of the variety of information stored in a large database. 

+ Many users of the database system do not need all stored information. 

+ Users classified need to access only a part of the database. 

+ The view level of abstraction exists to simplify their interaction with the 
system. The system may provide many views for the same database. 


Schema 


A schema is also known as database schema. It is a logical design of the 
database and a database instance is a snapshot of the data in the 
database at a given instant of time. A relational schema consists of a list 
of attributes and their corresponding domains. 


Types of Schemas 
It can be classified into three parts, according to the levels of abstraction 


Physical/Internal Schema Describes the database design at the physical 
level. 


Logical/Conceptual Schema/Community User View Describes the 
database design at the logical level. 


Sub-schemas/View/External Schema Describes different views of the 
database, views may be queried, combined in queries with base relations, 
used to define other views in general, not updated freely. 


Schema Architecture 


Data Independence 

Possibility to change the schema at one level without having to change it at 
the next higher level (nor having to change programs that access it at that 
higher level). 


There are two parts of data independence 


Logical Data Independence Refers to the immunity of the external 
schema to changes in the conceptual schema (/.e.,community schema). 


e.g., add new record or field. 


External level External view 1 | ++ + | External view n 


External/community NN a 
mapping 








Community level | Community schema | 


i 


| Internal schema | 





Community/internal 
mapping 
Internal level 























Stored Database 
Architecture of schema 
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Physical Data Independence Refers to the immunity of the conceptual 
schema to changes in the internal schema. e.g., addition of an index 
should not affect existing one. 


Database Functions and Application Functions 
Application Program Functions 
To be programmed in application programs. 


Database Functions or DBMS Functions 
Supplied by the DBMS and invoked in application programs. 


Database Design Phases 














Physical Design Phase 
focuses on DDL for 
tables, indexes and 

table spaces. 


Data Analysis Phase 
focuses on attributes, 
entities, relationships 
and integrity rules. 


Logical Design Phase 
focuses on tables, 
columns, primary keys 
and foreign keys. 









Transaction 


One execution of a user program (executing the same programs several 
times corresponds to several transactions). Basic unit of change as seen 
by the DBMS. 

OLTP (On-Line Transaction Processing) applications (e.g., banking and 
airline systems) with multiple simultaneous users. 


Functionality of DBMS 


1. Concurrency control 2. Backup and recovery 
3. Redundancy management 4. Access control 
5. Performance optimisation 6. Metadata management 


7. Active features (rules, triggers) 


Concurrency Control 


It is responsible for ensuring correctness of competing accesses to same 
data. One or more Structure Query Language (SQL) statements altogether 
treated as one single unit. 

Correctness of data requires four desirable properties (ACID properties) 
e.g., Concurrency control means there should not be two simultaneous 
withdrawals from the same bank account or there should not be multiple 
reservations of the same airplane seat. 
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Backup and Recovery 
e Facilities for recovering from hardware and software failures. 


e |f the computer system fails during a complex update program, the 
database must be restored to its state before the program started, or the 
program must be resumed, where it was interrupted so that its full effect is 
recorded in the database. 


e In a multiuser environment, it is more complex and important. 


Redundancy Management 


Redundancy means storing several copies of the same data. Redundant 
entries are frequent in traditional file processing; a goal of the database 
approach was to control redundancy as much as possible. 


Problems with redundancy includes waste of storage space, duplication of 
effort to perform a single conceptual update, danger of introducing 
inconsistency, if multiple updates are not coordinated. 


Access Control 

Responsible for enforcing security and authorisation (e.g., who can create 
new bank accounts) and data (e.g., which bank accounts can | see). 

It is all about who accesses what data, to do what, when, from where etc. 
Some examples of access privileges are as follows 

e To create a database 


e To authorise (grant) additional users to access the database, access 
some relations, create new relations and update the database. 


e To revoke privileges. 


-Key Points 


+ Ina multiuser database, access control is mandatory e.g., for confidentiality. 

+ Various ways to access data (e.g., read only, read and update). 

+ The data dictionary holds information about users and their access privileges 
(e.g., name and password). 


Performance Optimisation 

Performing physical reorganisations to enhance performance e.g., adding 
index, dropping index, sorting file. 

Performance optimisation is made possible by physical data 
independence and high level data models with user programs which can 
be optimised through DBMS software. 
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Metadata Management 


Metadata means data about data. Metadata is maintained in a special 
database, which we can call system catalog or data dictionary. It involves 
storing of information about other information. With different types of media 
being used, references to the location of the data can allow management 
of diverse repositories. 


Active Features 


Data objects, database statistics, physical structures and access paths, 
user access privileges etc, are active features of DBMS. 


Types of Database Model 


The database model can be of three types as given below 





(a) Hierarchical model (b) Network model 


Attributes/Column 


























Value (c) Relational model 


Database models 


Relational Database Management 
System (RDBMS) 


A system in which users access data with use of relation (/.e., data must be 
available in tabular form /.e., as a collection of tables, where each table 
consisting a set of rows and columns). That provides a variety of relational 
operator so that we can manipulate the data in tabular form. 


Features of RDBMS 


There are many features of RDBMS 

e We can create multiple relations (tables) and feed data into them. 

e Provides an interactive query language. 

e We can retrieve information from more than one table using join concept. 
e Provides a catalog or dictionary which consists of system tables. 


RDBMS Vocabulary 


Relation A table. 

Attribute A column in a table. 

Degree Number of attributes in a relation. 

Cardinality Number of tuples in a table. 

Domain or Type A pool of values from which specific attributes draw their values. 
NULL Special value for ‘unknown’ or ‘undefined’. 


Relational Data Model It provides mechanisms (languages) for defining data structures, 
operations for retrieval and modification of data and integrity constraints. 


Keys 
An attribute or set of attributes whose values uniquely identify each entity in 
an entity set is called a key for that entity set. 


Different types of keys are as follows 


Super Key 


It is a set of one or more attributes that allow us to identify uniquely an entity 
in the entity set. e.g., the Customer Id attribute of the entity set, customer is 
sufficient to distinguish one customer entity from another. Thus, 
Customer Id is a super key. Similarly, the combination of Customer_Name 
and Customer Id is a super key for the entity set customer. The 
Customer Name attribute of customer is not a super key, because several 
people might have the same name. 
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Candidate Key 


Since, as we saw a super key may contain extraneous attributes (/e., 
Customer Name here in the above case). If K is a super key, then 
candidate key is any superset of K. We are often interested in super keys 
for which no proper subset is a super key. Such minimal super keys are 
called candidate keys. Suppose that a combination of Customer Name 
and Customer Street is sufficient to distinguish among members of the 
customer entity set. Then, both {Customer Id} and {Customer Name, 
Customer Street} are candidate keys. Although, the attributes 
Customer Id and Customer Name together can distinguish customer 
entities, their combination does not form a candidate key, since the 
attribute Customer Id alone is a candidate key. 


Primary Key 

A candidate key that is chosen by the database designer as the principal 
means of identifying entities within an entity set. A key (primary, candidate 
and super) is a property of the entity set rather than of the individual 
entities. 


Alternate Key 

A table may have one or more choices for the primary key. Collectively 
these are known as candidate keys as discuss earlier. One is selected as 
the primary key. Those not selected are known as secondary keys or 
alternative keys. 


Secondary Key 

An attribute or set of attributes that may not be a candidate key but that 
classifies the entity set on a particular characteristics. e.g., suppose entity 
set employee having the attribute department whose value identifies all 
instances employee who belong to a given department. 


Foreign Key 

A foreign key is generally a primary key from one table that appears as a field 
in another, where the first table has a relationship to the second. In other 
words, if we had a table TBL1 with a primary key. A that linked to a table 
TBL2, where A was a field in TBL2, then A would be a foreign key in TBL2. 


Simple and Composite Key 
Any key consisting of a single attribute is called a simple key, while that 
consisting of a combination of attributes is called a composite key. 


E-R Modeling 


Entity—-Relationship model (ER model) in software engineering is an 
abstract way to describe a database. Describing a database usually starts 
with a relational database, which stores data in tables. Some of the data in 
these tables point to data in other tables for instance, your entry in the 
database could point to several entries for each of the phone numbers that 
are yours. The ER model would say that you are an entity, and each phone 
number is an entity, and the relationship between you and the phone 
numbers is ‘has a phone number’. 


Relationship 


An association among entities. e.g., There is a relationship between 
employees and department, which can be named as Works in. 


Relationship Set 

An association of entity sets e.g., Employee Department 

e A relationship instance is an association of entity instances e.g., 
Shyam_ Sales. 

e Same entity set could participate in different relationship sets. 

e Ann-array relationship set R relates n entity sets. 

e Arelationship set involving three entity sets, is known as a ternary relationship. 


Entity 


Anything that exists and can be distinguished/ real world object which can 
be distinguished from other objects. e.g., student. 


Entity Set 

A group of similar entities, e.g., all students. 

e All entities in an entity set have the same set of attributes. 
e Each entity set has a key. 

e Can be mapped to a relation easily. 


Weak Entity Set and Strong Entity Set 


An entity set may not have sufficient attributes to form a primary key. Such 
an entity set is termed a weak entity set. An entity set that has a primary key 
is termed a strong entity set. For a weak entity set to be meaningful, it must 
be associated with another entity set, called the identifying or owner entity 
set. Every weak entity must be associated with an identifying entity i.e., the 
weak entity set is said to be existence dependent on the identifying entity set. 
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Attribute 


Properties that describe an entity or we can say attributes are descriptive 
properties possessed by each member of an entity set and each attribute 
has a domain. 


There are many types of attributes 


Composite Attribute 
Attributes which can have component attribute. e.g., a composite attribute 
name, with component attributes First Name, Middle Name and Last Name. 


Derived Attribute 

The value for this type of attribute can be derived from the values of other 
related attributes or entities. For instance, let us say that the customer 
entity set has an attribute Loans Held, which represents how many loans a 
customer has from the bank. We can derive the value for this attribute by 
counting the number of loan entities associated with that customer. 


Descriptive Attribute 

If a relationship set has also some attributes associated with it, then we link 
these attributes to that relationship set. e.g., consider a relationship set 
depositor with entity sets customer and account. We could associate the 
attribute Access Date to that relationship to specify the most recent date 
on which a customer accessed an account. 


Single Valued Attribute 
Attribute which has only one value, e.g., the Employee Number attribute 
for a specific Employee_entity refers to only one employee number. 


Multi Valued Attribute 

Attributes which can have 0, 1 or more than 1 values. An employee entity 
set with the attribute Phone_Number. An employee may have Zero, one or 
several phone numbers and different employees may have different 
numbers of phones. 











Account 











ER diagram 
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Notations/Shapes in ER Modeling 








Rectangle represents entity type. 
Double/Bold rectangle represent weak entity type. C ] or C] 


Diamond represents relationship type. O 
Double/Bold diamond represents weak relationship 
type. or 


Ellipse represents attribute type. 


























Double ellipse represents multivalued attribute. C D 





Dashed ellipse denotes derived attribute. 





Line a link attribute to entity sets and entity sets to 
relationship sets 





Double lines which indicate total participation of an 
entity in a relationship set i.e., each entity in the entity set 
occurs in atleast one relationship in that relationship set. 








Mapping Cardinalities/Cardinality 
Ratio/Types of Relationship 
Expresses the number of entities to which another entity can be 


associated via a relationship set. For a binary relationship set R between 
entity sets A and B, the mapping cardinality must be one of the following 


One to One 
An entity in A is associated with at most one entity in B and an entity in B is 
associated with at most one entity in A. 


One to Many 
An entity in A is associated with any number (zero or more) of entities in 
B. An entity in B, however, can be associated with at most one entity in A. 
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Many to Many 
An entity in A is associated with any number (zero or more) of entities in 
B and an entity in B is associated with any number (zero or more) of entities 


in A. 
Employee ID_Card 


One to One Relationship 


1 M 
Student Enrolled Course 
| Student | | Course _| 


One to Many Relationship 


Student M Een a> M Test 


Many to Many Relationship 














Extended E-R Features 


The extended E-R features are given below 


Specialization 
Consider an entity set person with attributes name, street and city. A 
person may be further classified as one of the following 


(i) Customer (ii) Employee 
Each of these person types is described by a set of attributes that includes 
all the attributes of entity set person plus possibly additional attributes. The 
process of designating subgroupings within an entity set is called 
specialization. 


The specialization of person allows us to distinguish among persons 
according to whether they are employees or customers. 


The refinement from an initial entity set into successive levels of entity 
subgroupings represents a top-down design process in which distinctions 
are made explicitly. 


Generalization 

Basically generalization is a simple inversion of specialization. Some 
common attributes of multiple entity sets are chosen to create higher level 
entity set. If the customer entity set and the employee entity set are having 
several attributes in common, then this commonality can be expressed by 
generalization. 
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Here, person is the higher level entity set and customer and employee are 
lower level entity sets. Higher and lower level entity sets also may be 
designated by the terms super class and subclass, respectively. The 
person entity set is the super class of the customer and employee 
subclasses. 


Aggregation 

Aggregation is used when we have to model a relationship involving entity 
set and a relationship set. Aggregation is an abstraction through which 
relationships are treated as higher level entities. 


Suppose the relationship set Works_On (relating the entity sets employee, 
branch and job) as a higher level entity set called Works_On. Such an 
entity set is treated in the same manner as is any other entity set. We can 
create a binary relationship manages between Works _On and manager to 
represent who manages what tasks. Aggregation is meant to represent a 
relationship between a whole object and its component parts. 

















A 
s S 
O 
2 5| | Employee 
: S| [Ensioved 
T S 
E 5 
oO [= 
B 5 
Employee Customer 





Example of generalization and specialization Example of aggregation 


Integrity Constraints 


Necessary conditions to be satisfied by the data values in the relational 
instances so that the set of data values constitute a meaningful database. 
There are four types of integrity constraints 


Domain Constraint The value of attribute must be within the domain. 
Key Constraint Every relation must have a primary key. 


Entity Integrity Constraint Primary key of a relation should not contain 
NULL values. 
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Referential Integrity Constraint In relational model, two relations are 
related to each other over the basis of attributes. Every value of referencing 
attributes must be NULL or be available in the referenced attribute. 


Primary key | Dept_Id | Dept_Name | Manager_Id | Department 

















or unique table (parent 
constraint $ table) 
| 
Emp_ld [E_Name|L_Name Dept_Id] Ph-No 
Primary Foreign key 


key 
Employee table (department table) 


Relational Algebra and 
Relational Calculus 


Relational model is completely based on relational algebra. It consists of a 
collection of operators that operate on relations. 

Its main objective is data retrieval. It is more operational and very much 
useful to represent execution plans, while relational calculus is 
non-operational and declarative. Here, declarative means user define 
queries in terms of what they want, not in terms of how compute it. 


Relational Query Languages 


Used for data manipulation and data retrieval. Relational model support 
simple yet powerful query languages. To understand SQL, we need good 
understanding of two relational query language (i.e., relational algebra and 
relational calculus). 


Basic Operation in Relational Algebra 
The operations in relational algebra are classified as follows 


Selection 


The select operation selects tuples/rows that satisfy a given predicate or 
condition. We use (c) to denote selection. The predicate/condition appears 
as a subscript to ©. 
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e.g., Consider the below relation STUDENT 




















Roll_No Name CGPA Age 
30 Ram 7 20 
35 Shyam 8 21 
40 Mohan 9 22 
43 Hari 9 21 
be EGPA>8 (STUDENT) > Roll_No | Name | CGPA Age 
40 Mohan 9 22 
Condition predicate Relation 43 Hari 9 21 


Projection 














It selects only required/specified columns/attributes from a given 
relation/table. Projection operator eliminates duplicates (i.e., duplicate 
rows from the result relation) e.g., consider the STUDENT relation. 


Tage (STUDENT) = 

















Name, CGPA (STUDENT) = 








Age 
20 
21 
22 
Name CGPA 
Ram 7 
Shyam 8 
Mohan 9 
Hari 9 














Combining selection and projection 
IName, capa(Scepa > 8 (STUDENT)) 














l 
Name CGPA 
Mohan 9 
Hari 9 
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Union 


It forms a relation from rows/tuples which are appearing in either or both of 

the specified relations. For a union operation R US to be valid, below two 

conditions must be satisfied. 

e The relations R and S must be of the same entity. i.e., they must have the 
same number of attributes. 


e The domains of the /th attribute of R and /th attribute of S must be the 
same, for all /. 


Intersection 


It forms a relation of rows/ tuples which are present in both the relations R 
and S. As with the union operation, we must ensure that both relations are 
compatible. 
























































Name Age Name Age 
Ram 20 Ram 20 
Mohan 21 Shyam 22 
R S 
RUS = Name Age RaAS> Name Age 
(Union Ram 20 (Intersection Ram 20 
operation) — Shyam 22 operation) 
Mohan 21 








Set Difference 


It allows us to find tuples that are in one relation but are not in another. The 
expression R —S produces a relation containing those tuples in R but not 
ins. 





R-S=> Name Age 
Mohan 21 (UName, Age (R) -IhName, Age (S)) 

















Cross Product/Cartesian Product 


Assume that we have n; tuples in R and n, tuples in S. Then, there are n,*n5 
ways of choosing a pair of tuples; one tuple from each relation. So, there 
will be (n4 * no) tuples in result relation P if P = R x S. 


Rename operation Relation name 
L 


Paca (Account) 


New name 


136 Database Management System 


p, (E4), where x is the new name for the result of E4 and E, may be an 
expression. 
Schema Refinement/Normalization Decomposition of complex records 
into simple records. Normalization reduces redundancy using non-loss 
decomposition principle. 
Shortcomings of Redundancy Data redundancy causes three types of 
anomalies insert, update and delete. Storage problem (i.e., wastage of 
storage). 
Decomposition Splitting a relation R into two or more subrelation R4 and 
R. A fully normalized relation must have a primary key and a set of 
attributes. 


Database Design Goal (Schema Refinement) 


There are many goals for the design of a database. Here are some of them 
listed 
The Database is Compressive It includes all the needed data and 
connections. 
The Database is Understandable There is a clear structure which leads 
to easy, flexible and fast reading and updating of the data. 
The Database is Expandable Itis possible to change the structure of the 
database with a minimum change to the existing software. 
e 0% redundancy 
Decomposition should satisfy 

(i) Lossless join (ii) Dependency preservence 


Lossless Join Decomposition 
Join between the subrelations should not create any additional tuples 
or there should not be a case such that more number of tuples in R4 than 
R3 

R c R< R, => (Lossy) 

R = R; >< R, = (Lossless) 
Dependency Preservence 


Because of decomposition, there must not be loss of any single 
dependency. 
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Functional Dependency (FD) 


Dependency between the attribute is known as functional dependency. Let 
R be the relational schema and X, Y be the non-empty sets of attributes 
andt,,to,...,t, are the tuples of relation R. 


X => Y {values for X functionally determine values for Y} 
If the above condition holds, it means if 


tı- X =t; X, thent,-Y =t- Y 














tup Les X Y 
t a b 
ty a b 











Relation R4, which holds X > Y 











tup Les X Y 
ty a b 
ty a c 











Relation R,, which does not hold X > Y 


Trivial Functional Dependency 
If X DY, then X > Y will be trivial FD. 


Here, X and Y are set of attributes of a relation R. 


In trivial FD, there must be a common attribute at both the sides of ‘~’ 
arrow. 


Sia Shame >S 


ame 
i trivial FD 
Sia Shame <2 Sia 


Non-Trivial Functional Dependency 
If X AY = (no common attributes) and X — Y satisfies FD, then it will be a 
non-trivial FD. 
Cig = u o. 
non-trivial FD 
Sia = Shame 


(no common attribute at either side of ‘—’ arrow) 
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Case of semi-trivial FD 
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Sig — Sid Sname (Semi-trivial) 


Because on decomposition, we will get 
Sig 2 Sig (trivial FD) and 
Sig 2 Shame (non-trivial FD) 


Properties of Functional Dependence (FD) 
Reflexivity If X DY, then X —> Y (trivial) 

Transitivity If X — Y and Y > Z, then X > Z 

Augmentation If X > Y, then XZ — YZ 

Splitting or Decomposition If X — YZ, then X > Y and X >Z 
Union If X > Y and X > Z, then X > YZ 


Attribute Closure i 
Suppose R (X, Y, Z) be a relation having set of attributes i.e., (X, Y, Z), then (X*) will l 


be an attribute closure which functionally determines other attributes of the relation (if 


not all then atleast itself). 


DBMS versus RDBMS 


DBMS 


RDBMS 





e In DBMS, relationship between two 
tables or files are maintained 
programmatically, morganatically 


In RDBMS, relationship between two 
tables or files can be specified at the time 
of table creation 





e DBMS does not support client/server 
architecture 


Most of the RDBMS supports client/server 
architecture 





e DBMS does not support distributed 
database 


Most of the RDBMS support distributed 
databases 





e In DBMS, there is no security of data 


In RDBMS, there are multiple levels of 
security 





(i) Cogging in at o/s Level 





(ii) Comand Level 





(iii) Object level 





e Each table is given an extension in 
DBMS 





Many tables are grouped in one database 
in RDBMS 





Normal Forms/Normalization 


In relational database design, the normalization is the process for 
organizing data to minimize redundancy. Normalization usually involves 
dividing a database into two or more tables and defining relationship 
between the tables. 

The normal forms define the status of the relation about the individuated 
attributes. There are five types of normal forms 


First Normal Form (1NF) 


Relation should not contain any multivalued attributes or relation should 
contain atomic attributes. 


Anomalies in Database 











Stud_Id Stud_Name Course_Name 
S} A C/C++ 
S, B C++ 
S3 B C++/Java 





(STUDENT) Relation 
The above relation is not in 1 NF 














l 
Stud_Id Stud_Name Course _Name 
S; A C 
Ss, A C++ 
S, B C++ 
S3 B C++ 
Sa B Java 








The above relation is in 1NF but now Stud _Id is no more a primary key. 


Now in the above case, we need to modify our primary key which is 
(Stud_Id, Course-Name) 


Note The main disadvantage of 1NF is high redundancy. 
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Second Normal Form (2NF) 
Relation R is in 2NF if and only if 
e R should bein 1NF. e R should not contain any partial dependency. 


Partial Dependency 
Let R be the relational schema having X, Y, A, which are non-empty set of 
attributes, where 


Partial 


X = Any candidate key of the relation. dependency 
a” 





Y = Proper subset of any candidate key a. 
A = Non-prime attribute (/.e., A doesn’t Model showing partial dependency 
belong to any candidate key) 
In the above example, X > A already exists and if Y — A will exist, then it 
will become a partial dependency, if and only if 

e Y is a proper subset of candidate key. 

e A should be non-prime attribute. 


If any of the above two conditions fail, then Y > A will also become fully 
functional dependency. 


Full Functional Dependency 
A functional dependency P + Q is said to be fully functional dependency, if 
removal of any attribute S from P means that the dependency doesn’t hold 
any more. 

(Student_Name, College Name — College Address) 
Suppose, the above functional dependency is a full functional 
dependency, then we must ensure that there are no FDs as below. 


(Student_ Name — College Address) 
or (College Name — Collage Address) 


Third Normal Form (3NF) 


Let R be a relational schema, then any non-trivial FD X — Y over R is in 3NF, if 
e X should be a candidate key or super key. 
or 
e Y should be a prime attribute. 
e Either both of the above conditions should be true or one of them should 
be true. 
e R should not contain any transitive dependency. 
e For a relation schema R to be a 3NF, it is necessary to be in 2NF. 
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Transitive Dependency 

AFD, P >Q in a relation schema R is a transitive if 

e There is a set of attributes Z that is not a subset of any key of R. 
e Both X => Z and Z =Y hold 


m CaS. 


Candidate Not a candidate Non-prime 
key or super key attribute 


R (ABCD) Transition state diagram 
showing transitive dependency 


R; (ABCD) R, (DE) 
AB >C D>E 
C>D 
e The above relation is in 2NF. 


e |n relation R4, C is not a candidate key and Dis non-prime attribute. Due to 
this, R4 fails to satisfy 3NF condition. Transitive dependency is present 
here. 


AB — C and C —> D, then AB > D will be transitive. 


Boycee Codd Normal Form (BCNF) 


Boycee Codd Normal Form (BCNF) Let R be the relation schema and X —>Y be the 


any non-trivial FD over Ris in BCNF if and only if X is the candidate key or super key. 
X >Y 


If R satisfies this dependency, then of course it satisfy 2NF and 3NF. 
candidate / super key 


1NF ! 
i 2 NF i 


3 NF 


Hierarchy of normal forms 
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Summary of 1 NF, 2 NF and 3 NF 











Normal Form Test Remedy (Normalization) 

1 NF Relation should have no Form name relation for each non-atomic 
non-atomic attributes or attribute or nested relation. 
nested relations. 

2 NF For relations where primary Decompose and set up a new relation for 
key contains multiple each partial key with its dependent 
attributes, no non-key attributes. Make sure to keep a relation 
attribute should be with the original primary key and any 
functionally dependentona attributes that are fully functionally 
part of the primary key. dependent on it. 

3 NF Relation should not havea Decompose and setup a relation that 
non-key attribute includes the non-key attribute(s) that 


functionally determined by functionally determine(s) other non-key 
another non-key attribute attribute(s) 

(or by a set of non-key 

attributes), i.e., there should 

be no transitive dependency 

of a non-key attribute on 

the primary key. 


Fourth Normal Form (4NF) 


ANF is mainly concerned with multivalued dependency. A relation is in 4NF 
if and only if for every one of its non-trivial multivalued dependencies X >> 
Y, X is a super key (/.e., X is either a candidate key or a superset). 


Fifth Normal Form (5NF) 


It is also Known as Project Join Normal From (PJ/NF). SNF reduces 
redundancy in relational database recording multivalued facts by isolating 
semantically related multiple relationships. 


A table or relation is said to be in the 5NF, if and only if every join 
dependency in it, is implied by the candidate keys. 


-Key Points eeens 


+ The Normal Forms (NF) of relational database theory provide criteria for 
determining a table’s degree of vulnerability to logical inconsistencies and 
anomalies 

+ Databases intended for Online Transaction Processing (OLTP) are typically 
more normalized than databases intended for Online Analytical Processing (OLAP) 

+ OLTP applications are characterized by a high volume of small transactions 
such as updating a sales record at a supermarket checkout counter. 

+ The expectation is that each transaction will leave the database in a consistent state 


SQL 


Structured Query Language (SQL) is a language that provides an interface to 
relation database systems. SQL was developed by IBM in the 1970, for use in 
system R and is a defacto standard, as well as an ISO and ANSI standard. 








Schema Sequences 
Table + Triggers 
View User Defined Functions (UDFs) 
Index Stored procedure 








Synonyms/Alias | Database application objects 








Database objects 


To deal with the above database objects, we need a programming 
language and that programming language is known as SQL. 


Three subordinate languages of SQL are 


Data Definition Language (DDL) 
It includes the commands as 


CREATE To create tables in the database. 

ALTER To modify the existing table structure. 

DROP To drop the table with table structure. 

Data Manipulation Language (DML) It is used to insert, delete, update 
data and perform queries on these tables. Some of the DML commands 
are given below. 

INSERT To insert data into the table. 

SELECT To retrieve data from the table. 

UPDATE To update existing data in the table. 

DELETE To delete data from the table. 


Data Control Language (DCL) 
It is used to control user’s access to the database objects. Some of the 
DCL commands are 


GRANT Used to grant select/insert/delete access. 
REVOKE Used to revoke the provided access. 
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Transaction Control Language (TCL) 
It is used to manage changes affecting the data. 


e COMMIT To save the work done, such as inserting or updating or 
deleting data to/from the table. 


e ROLLBACK Torestore database to the original state, since last commit. 


e SQL Data Types SQL data types specify the type, size and format of 
data/information that can be stored in columns and variables. 


-Key POINGS este et 


+ SQL is a special-purpose programming language designed for managing data 
held in a Relational Database Management Systems (RDBMS). 

+ SQL based upon relational algebra and tuple relational calculus, SQL consists 
of a data definition language and a data manipulation language. 


Various Data Types in SQL 

e Data Time time-stamp 

e Char Big int Integer 

e Decimal Small int Double 

There are so many other data types also. 


Database Constraints 


These are user defined that let us restrict the behaviours of column. We can 
create constraints when we define a table with a SQL CREATE statement. 


Inline Constraint 
A constraint defined on the same line as its column. 


Out of Line Constraint 
A constraint defined on it’s own line in a CREATE statement. This type of 
constraint must reference the column that they constrain. 
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Constraint Types with Description 


Name of Table level/column level/Row 








constraint level/External level Description 
NOTNULL Column level Restricts a column by making it 
mandatory to have some value. 
Unique Table level Checks whether a column value will be 


unique among all rows in a table. 





Primary key Column level and Table level Checks whether a column value will be 
unique among all rows in a table and 








disallows NULL values. 
Check Column level Restrict a column value to a set of 
constraint Row level External level values defined by the constraint. 
Foreign key Column level Restrict the values that are acceptable 
constraint External level in a column or group of columns of a 


table to those values found in a listing 
of the column/group of columns used 
to define the primary key in other table. 


-Key Points =e 


+ The most common operation in SQL is the query, which is performed with 
the declarative SELECT statement. 

+ SELECT retrieves data from one or more tables, or expressions. 

+ Standard SELECT statements have no persistent effects on the database. 

+ Queries allow the user to describe desired data, leaving the Database 
Management System (DBMS) responsible for planning, optimizing, and 
performing the physical operations necessary to produce that result as it chooses. 


Default Constraint 
It is used to insert a default value into a column, if no other value is 
specified at the time of insertion. 


Syntax 
CREATE TABLE Employee 
{ 
Emp_id int NOT NULL, 
Last_Name varchar (250), 
City varchar (50)DEFAULT ‘BANGALURU’ 
} 


DDL Commands 
1. CREATE TABLE < Table_Name> 


{ 


Column_name 1 < data_type >, 
Column_name 2 < data_type > 
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2. ALTER TABLE < Table_Name > 

ALTER Column < Column_Name> SET NOT NULL 
3. RENAME < object_type >object_name >to <new_name > 
4. DROP TABLE <Table_Name> 


DML Commands 
SELECT Aq; Ao, Ag,---,A, what to return 
FROM R4, Ro, R3,.--, Rm relations or table 


WHERE condition filter condition /.e., on what basis, we want to restrict the 
outcome/result. 


If we want to write the above SQL script in the form of relational calculus, 
we use the following syntax 
{ITAA (O condition (R1 X Ro X..-x Rm))} 
Comparison operators which we can use in filter condition are 
(=, > <, > =, < = , < >) ‘< >means not equal to. 
INSERT Statement 
Used to add row (s) to the tables in a database. 
INSERT INTO Employee (F_Name, L Name)VALUES (‘Atal’, ‘Bihari’) 


-Key PONTS — 


+ Data values can be added either in every column in a row or in same 
columns in a row by specifying the columns and their data. 


+ All the columns that are not listed in the column list in INSERT statement, will 
receive NULL. 


UPDATE Statement 


It is used to modify/update or change existing data in single row, group of 
rows or all the rows in a table. 


€.g., 
UPDATE Employee An example of selective update 
SET City = ‘LUCKNOW’ which will update some rows in 
WHERE Emp_Id BETWEEN a table. 
9 AND 15; 


UPDATE Employee SET City Example of global update which will 
= ‘LUCKNOW’ ; update city column for all the rows. 


Handbook Computer Science & IT 147 


DELETE Statement 
This is used to delete rows from a table. 


e€.g., 
DELETE Employee WHERE Example of selective delete 
Emp_Id = 7; 
DELETE Employee This command will delete all the 
rows from Employee table. 
ORDER BY Clause 


This clause is used to sort the result of a query in a specific order 
(ascending or descending). 


By default sorting order is ascending. 
SELECT Emp_Id, Emp_Name, City FROM Employee 
WHERE City = ‘LUCKNOW’ 
ORDER BY Emp_Id DESC; 


GROUP BY Clause 
It is used to divide the result set into groups. Grouping can be done by a 
column name or by the results of computed columns when using numeric 
data types. 
e The HAVING clause can be used to set conditions for the GROUP BY clause. 
e HAVING clause is similar to the WHERE clause, but having puts 
conditions on groups. 
e WHERE clause places conditions on rows. 
e WHERE clause can’t include aggregate function, while HAVING 
conditions can do so. 
e.g., 
SELECT Emp_Id, AVG (Salary) 
FROM Employee 
GROUP BY Emp_Id 
HAVING AVG (Salary) > 25000; 


Aggregate Functions 





Function Description 
SUM () It returns total sum of the values in a column. 
AVG () It returns average of the values in a column. 
COUNT () Provides number of non-null values in a column. 


MIN () and MAX () Provides lowest and highest value respectively in a column. 
COUNT (*) Counts total number of rows in a table. 





Joins 


Joins are needed to retrieve data from two tables’ related rows on the basis 
of some condition which satisfies both the tables. Mandatory condition to 
join is that atleast one set of column (s) should be taking values from same 
domain in each table. 


Types of Joins 


The Two types of joins are given below 


Inner Join 
Inner join is the most common join operation used in applications and can 
be regarded as the default join-type. Inner join creates a new result table by 
combining column values of two tables (A and B) based upon the 
join-predicate. These may be further divided into three parts. 

(i) Equi Join (satisfies equality condition) 

(ii) Non-Equi Join (satisfies non-equality condition) 

(iii) Self Join (one or more column assumes the same domain of values) 

Considers only pairs that satisfy the joining condition 

Result is the intersection of the two tables. 


























pees C1|c2] [D12 

FROM T1 INNER JOIN T2 | A 1 [YES 

On T1. C1 = T2.D1 2| B 3 |NO 
T1 T2 


T1 T2 
Inner join in set 7, and set 7, 
Result set of T1 and T2 


T1. C1 C2 T2. D1 D2 
1 A 1 YES 























-Key Points- eeren 


+ The cross join does not apply any predicate to filter records from the joined 
table. Programmers can further filter the results of a cross join by using 
a WHERE clause. 

+ Anequi-joinis a specific type of comparator-based join, that uses 
only equality comparisons in the join-predicate 

+ A special case, a table (base table, view, or joined table) can JOIN to itself in 
a self-join. 
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Outer Join 


An outer join does not require each record in the two joined tables to have 
a matching record. The joined table retains each record-even if no other 
matching record exists. 


Considers also the rows from table (s) even if they don’t satisfy the joining 
condition 


(i) Right outer join (ii) Left outer join (iii) Full outer join 
Left Outer Join 
The result of a left outer join 
for table A and B always SELECT * FROM T1 


contains all records of the left Ce CUERO TE ONET 





















































; N ID = T2. ID 
table (A), even if the join 
condition does not find any T 1 
matching record in the right table (B). 
ID Name ID Branch 
1 Ram 1 IT 
2 Shyam 3 CS 
Ti Left Outer Join 
Result set of T1 and T2 
T1.ID Name T2. ID Branch 
1 Ram 1 IT 
2 Shyam NULL NULL 




















Right Outer Join 

A right outer closely resembles a left outer join, except with the treatment of 
the tables reversed. Every row from the right table will appear in the joined 
table at least once. If no matching with left table exists, NULL will appear. 





SELECT * FROM T4 RIGHT | Mem || Di Branch 


OUTER JOIN T2 ON T1. ID Ram 1 IT 
=T2.1D 
TA T2 
Tı T2 
Right outer Join 


Result set of T1 and T2 


T1. ID Name T2. ID Branch 
1 Ram 
NULL NULL 3 CS 








ER 
= 
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Full Outer Join 


A full outer join combines the effect of applying both left and right outer 
joins. where records in the FULL OUTER JOIN table do not match, the 
result set will have NULL values for every column of the table that lacks a 
matching row. for those records that do match, as single row will be 
produced in the result set. 


SELECT * FROM T, FULL OUTER 
JOIN T2 ON T1. ID=T2.1ID 


Ty T2 
Result set of T1 and T2 (Using tables of previous example) 














T1 -ID Name T2- ID Branch 
1 Ram 1 IT 
2 Shyam NULL NULL 
NULL NULL 3 CS 




















Cross Join (Cartesian Product) 


Cross join returns the cartesian product of rows form tables in the join. It 
will produce rows which combine each row from the first table with each 
row from the second table. 

Select * FROM T1, T2 

Number of rows in result set = (Number of rows in table 1 x Number of rows 
in table 2) 


Result set of T1 and T2 (Using previous tables T1 and T2) 



































ID Name ID Branch 
1 Ram 1 IT 
2 Shyam 1 IT 
1 Ram 3 CS 
2 Shyam 3 CS 
-Key POINTS iiini 


+ A programmer writes a JOIN predicate to identify the records for joining. If 
the evaluated predicate is true, the combined record is then produced in the 
expected format, a record set or a temporary table. 

+ A right outer join returns all the values from the right table and matched 
values from the left table (NULL in case of no matching join predicate). 

+ A left outer join returns all the values from an inner join plus all values in the 
left table that do not match to the right table. 


Transaction Management 


A sequence of many actions which are considered to be one atomic unit of 
work. A transaction is a collection of operations involving data items in a 
database. There are four important properties of transactions that a DBMS 
must ensure to maintain data in the face of concurrent access and system 
failures. 


Atomicity 
Atomicity requires that each transaction is all or nothing. If one part of the 
transaction fails, the entire transaction fails, and the database state is left 
unchanged. 


Consistency 
If each transaction is consistent and the data base starts on as consistent, 
it ends up as consistent. 


Isolation 

Execution of one transaction is isolated from that of another transactions. 
It ensures that concurrent execution of transaction results in a system state 
that would be obtained if transaction were executed serially, i.e., one after 
the other. 


Durability 

Durability means that once a transaction has been committed, it will remain 
even in the event of power loss, crashes, or errors. In a relational database, 
for instance, once a group of SQL statements execute, the results need to 
be stored permanently. 


v Key POINTS anna 


+ In computer science, ACID (Atomicity, Consistency, Isolation, Durability) is a set 
of properties that guarantee that database transactions are processed reliably. 


+ In the context of databases, a single logical operation on the data is called a 
transaction. 


+ A transfer of funds from one bank account to another, even involving multiple 
changes such as debiting one account and crediting another, is a single transaction. 


If a transaction commits, its effects persist. 


e Atransaction starts with any SQL statement and ends with a COMMIT or 
ROLLBACK. 
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e COMMIT statement makes changes permanent to the database. 
e ROLLBACK statement reverses changes. 


e A transaction includes one or more database access operations. These 
can include insertion, deletion, modification or retrieval operations. 


Database Access Operations 
The basic database access operations are 
= Read-item (X) Reads a database item named X into a program variable or R(X). 


= Write-item (X) Writes the value of program variable X into the database item 
named X or W(X). 


Classification of a Database System 
According to the Number of Users 


Single User 


A DBMS is single user, if at most one user at a time can use the system. 


Multiuser 


A DBMS is multiuser, if many user can use the system and hence access 
the database concurrently. 


e.g., An airline reservation system is used by hundreds of travel agents and 
reservation clerks submit transactions concurrently to the system. 


Transaction States 


The following are the different states in transaction processing in a database 
system 
1. Active 2. Partially committed 3. Failed 4. Aborted 






End 
transaction 


Abort C Aborted 
Read, write 


State transition diagrams 







Begin 
transaction 





Active This is the initial state. The transaction stay in this state while it is 
executing. 
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Partially Committed This is the state after statement of the transaction is 
executed. 


Failed After the discovery that normal execution can no longer proceed. 


Aborted The state after the transaction has been rolled back and the 
database has been resorted to its state prior to the start of the transaction. 


Isolation Levels 

If every transaction does not make its updates visible to other transaction 
until it is committed, so isolation is enformed that solves the temporary 
update problem and eliminates cascading rollbacks. 

There have been attempts to define the level of isolation of a transaction. 
Level-0 Isolation A transaction is said to have level-O isolation, if it does 
not overwrite the dirty reads of higher-level transactions. 

Level-1 Isolation Level-1 isolation has no lost updates. 

Level-2 Isolation Level-2 isolation has no lost updates and no dirty reads. 
Level-3 Isolation Level-3 isolation (also true isolation) has repeatable reads. 


Concurrency Control 


Process of managing simultaneous execution of transactions in a shared 
database, is known as concurrency control. Basically, concurrency 
control ensures that correct results for concurrent operations are 
generated, while getting those results as quickly as possible. 


Need of Concurrency Control 


Simultaneous execution of transactions over a shared database can create 
several data integrity and consistency problems. 


Lost Update 

This problem occurs when two transactions that access the same 
database items, have their operations interleaved in a way that makes the 
value of some database items incorrect. 


Dirty Read Problems 

This problem occurs when one transaction reads changes the value while 
the other reads the value before committing or rolling back by the first 
transaction. 
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Inconsistent Retrievals 


This problem occurs when a transaction accesses data before and after 
another transaction(s) finish working with such data. 


We need concurrence control, when 


e The amount of data is sufficiently great that at any time only fraction of the 
data can be in primary memory and rest should be swapped from 
secondary memory as needed. 

e Even if the entire database can be present in primary memory, there may 
be multiple processes. 


Rey Pint Sanaa 
+ A failure in concurrency control can result in data corruption from torn read 
or write operations. 


+ DBMS need to deal also with concurrency control issues not typical just to 
database transactions but rather to operating systems in general. 


* Concurrency control is an essential element for correctness in any system 
where two database transactions or more, executed with time overlap, can 
access the same data, e.g., virtually in any general purpose database system. 


Schedule 


A schedule (or history) is a model to describe execution of transactions 
running in the system. When multiple transactions are executing 
concurrently in an interleaved fashion, then the order of execution of 
operations from the various transactions is known as a schedule or we can 
say that a schedule is a sequence of read, write, abort and commit 
operations from a set of transactions. 


The following is an example of a schedule 











T, T T, 
R(X) 
W (X) 
Commit 
R(Y) 
W(Y) 
Commit 
R(Z) 
W(Z) 
Commit 
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The above schedule consists of three transactions T4, 7, and 7}. The first 
transaction 74 reads and writes to object X and then commits. Then, 75 
reads and writes to object Y and commits and finally 7; reads and writes to 


object Z and commits. 


Classification of Schedules Based on 


Serializability 


The schedules can be classified as 


Serial Schedule 


A schedule in which the different transactions are not interleaved (i.e., 
transactions are executed from start to finish one-by-one). 


Serial Schedule 








Serial Schedule 








T Tr T T, 
R(A) R(B) 
W(A) W(B) 
R(B) R(A) 
W(B) W(A) 














Complete Schedule 
A schedule that contains either a commit or an abort action for each 


transaction. 


Note Consequently, a complete schedule will not contain any active 
transaction at the end of the schedule. 


Complete Schedule 





Complete Schedule 





Complete Schedule 





T, T T, T, T, T 
R(A) R(A) R(A) 
R(B) W(A) W(A) 
W(A) R(B) Commit 
W(B) Commit R(B) 
Commit W(B) W(B) 
Abort Abort Abort 
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Non-serial Schedules 
Non-serial schedules are interleaved schedules and these schedules 
improve performance of system (i.e., throughput and response time). But 
concurrency or interleaving operations in schedules, might lead the 
database to an inconsistent state. 
We need to seek to identify schedules that are 

(i) As fast as interleaved schedules. 

(ii) AS consistent as serial schedules. 
Conflicting Operations When two or more transactions in a non-serial 
schedule execute concurrently, then there may be some conflicting 
operations. 
Two operations are said to be conflicting, if they satisfy all of the following 
conditions 
e The operations belong to different transactions. 
e Atleast one of the operations is a write operation. 
e The operations access the same object or item. 


The following set of operations is conflicting 
T, T, T, 


ROX, 
































W(X) 
While the following sets of operations are not conflicting 
T Tr T, T T, h 
R(X)Na R(X) 
R(X) Na wiy) 
R(X) R(X) 
//No write on same object //No write on same object 
-Key Pitt Sanaa 


+ If two conflicting operations are applied in different orders in two schedules, 
the effect can be different on the database or on other transactions in the 
schedule, hence the schedules are not conflict equivalent. 

+ In view equivalence respective transactions in the two schedules read and 
write the same data values while in conflict equivalence, respective 
transactions in two schedules have the same order of conflicting operations. 
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Serializable Schedule 
A schedule S of n transactions is serializable, if it is equivalent to some 
serial schedule of the same n transactions. 
A non-serial schedule S is serializable is equivalent to saying that it is 
correct, because it is equivalent to a serial schedule. 
There are two types of serializable schedule 

(i) Conflict serializable schedule 

(ii) View serializable schedule 


Conflict Serializable Schedule 

When the schedule (S) is conflict equivalent to some serial schedule (S’) , 
then that schedule is called as conflict serializable schedule. In such a 
case, we can reorder the non-conflicting operations in S until we form the 
equivalent serial schedule S’. 


Serial schedule Serializable schedule A Serializable schedule B 





T r T r T h 
R(A) R(A) R(A) 
W(A) W(A) W(A) 
R(B) N R(A) RA) 1-7 
W(B) Ng W(A) R(B) 

RIA) R(B) W(B) 

W(A) W(B) W(A) a 
Nee N AB) R(B) 

W(B) W(B) W(B) 











l 
Conflict Equivalence 

The schedules 5, and S, are said to be conflict equivalent, if the following conditions | 
are satisfied 
= Both schedules $, and S, involve the same set of transactions (including ordering , 
of operations within each transaction). ! 

= The order of each pair of conflicting actions in S4 and S, are the same. 


Testing for Conflict Serializability of a Schedule 


There is a simple algorithm that can be used to test a schedule for conflict 
serializability. This algorithm constructs a precedence graph (or 
serialization graph), which is a directed graph. 
A precedence graph for a schedule S contains 


158 Database management system 


(i) A node for each committed transaction in S. 


(ii) An edge from 7; to 7;, if an action of T; precedes and conflicts with 
one of 7;'s operations. 

















Conflict serializability 


A schedule S is conflict serializable if and only if its precedence graphs 
is acyclic. 


(Cycle exists) 


@) 


Precedence graph 


Note The above example of schedule is not conflict serializable schedule. 
However, in general, several serial schedules can be equivalent to any 
serializable schedule S, if the precedence graph for S has no cycle and if 
the precedence graph has a cycle, it is easy to show that we cannot 
create any equivalent serial schedule, so S is not serializable. 


-Key POINTS iirinn 


+ Serial schedule is slower but guarantees consistency (correctness). 


+ Every serial schedule is a serializable schedule but not every serializable 
schedule is serial schedule. 


+ Being serializable implies that 
The schedule is a correct schedule, if 
It will leave the database in a consistent state and the interleaving is 
appropriate and will result in a state as if the transaction were serially 
executed. 


A schedule is a efficient (interleaved) schedule. 
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View Serializable Schedule 
A schedule is view serializable, if it is view equivalent to some serial 
schedule. 

Conflict serializable = View serializable, but not vice-versa. 


All schedules 
View serializable 
Conflict serializable 
Serial schedule 



































View Equivalence 
Two schedules S, and S, are view equivalent, 


(i) T; reads initial value of database object A in schedule S4, then 7; also 
reads initial value of database object A in schedule Sp. 


(ii) T; reads value of A written by 7; in schedule S4, then 7; also reads 
value of A written by 7; in schedule S5. 

(iii) 7; writes final value of A in schedule S4, then 7; also writes final value 
of Ain Sp. 


Schedule S; Schedule S, 











In the above example, both schedules S4 and S, are view equivalent. So, 
they are view serializable schedules. But S4 schedule is not conflict 
serializable schedule and S, is not conflict serializable schedule because 
cycle is not formed. 
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Classification of Schedules Based on 

Recoverability 

e In recoverability, there is need to address the effect of transaction failures 
on concurrently running transactions. 

e Once atransactionT is committed, it should never be necessary to rollback T. 


e The schedules those meet this criterion are called recoverable schedules 
and those do not, are called non-recoverable. 


Recoverable Schedule 


Once a transaction T is committed, it should never be necessary to rollback 
T. The schedules under this criteria are called recoverable schedules and 
those do not, are called non-recoverable. 

A schedule S is recoverable, if no transaction T in S commits until all 
transactions T’, that have written an item that T reads, have committed. 


Initial Recoverable Schedule 





T, r 
R(X) o 
MOa 

(X) 
W(X) 
Commit 
Abort 








(This schedule is not recoverable because 7, made a dirty read and 
committed before 4, 7, should have committed first.) 

The above given schedule is non-recoverable because when the recovery 
manager rolls back 7, transaction, then X gets its initial value before updation. 
But 7, has already utilised the wrong value of X that was updated by 7; and 7, 
committed. Now, database is consequently in an inconsistent state. 


Composition of Recoverable 
Initial Non-recoverable Schedule 





T, T, 
RX) 
W~ reag 
E RY) 
W(X) 
Commit 
Abort 
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Recoverable Schedule (A) Recoverable Schedule (B) 
T; T, h b 
R(X) R(X) 
W(X) R(X) 
W(X) W(X) 
Commit Commit 
Abort Abort 
(Commit after parent of dirty read) Remove dirty read (Recoverable) 
(Recoverable) not serializable schedule (Order of 


conflicting actions has changed) 


Cascadeless Schedule 


These schedules avoid cascading rollbacks. Even, if a schedule is 
recoverable to recover correctly from failure of transaction 7;, we may have 
to rollback several transactions. This phenomenon in which a single 
transaction failure which leads to a series of transaction rollbacks, is called 
cascading rollbacks. 


Cascading rollback is undesirable, since it leads to the undoing of a 
significant amount of work. It is desirable to restrict the schedules to those, 
where cascading rollback cannot occur. Such schedules are called 
cascadeless schedules. 
Cascadeless Schedule 
T 
R(X) 
W(X) 
R(Y) 
WY) 
(a) Rollback Abort | (b) Now 7, reads the value 
of X that existed before 7, 
started. 
R(X) 
W(X) 
Commit 


wi 








Strict Schedule 

A schedule is strict, 

e |f overriding of uncommitted data is not allowed. 

e Formally, if it satisfies the following conditions 
(i) T; reads a data item X after 7; has terminated (aborted or committed). 
(ii) 7; writes a data item X after 7; has terminated (aborted or committed). 


162 Database management system 


Characterizing Schedules through Venn Diagram 








View serializable All schedules 


Conflict 
serializable 


Avoid cascading rollback 








Recoverable 


Strict 




















Concurrency Control with Locking 

e Previously, we have characterised transaction schedules based on 
serializability and recoverability. 

e |f concurrency control with locking technique is used, then locks prevent 
multiple transactions from accessing the items concurrently. 

e Access on data only, if TA ‘has lock’ on data. 

e Transactions request and release locks. 

e Schedule allows or differs operations based on lock table. 


Lock 


A lock is a variable associated with a data item that describes the status of 
the item with respect to possible operations that can be applied to it. (e.g., 
read lock, write lock). Locks are used as means of synchronizing the 
access by concurrent transactions to the database items. There is one lock 
per database item. 


Problems Arising with Locks 

There are two problems which are arised when using locks to control the 
concurrency among transactions 

Deadlock Two or more competing transactions are waiting for each other 
to complete to obtain a missing lock. 

Starvation A transaction is continually denying the access to a given data 
item. 
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Purpose of Concurrency Control with Locking 

e To enforce isolation among conflicting transactions. 

e To preserve database consistency. 

e To resolve read-write, write-read and write-write conflicts. 

Locking is an operation that secures 

e Permission to read 

e Permission to write a data item 

Lock (X) Data item X is locked on behalf of the requesting transaction. 


Unlocking is an operation which removes these permissions from the data 
item. 


Unlock (X) Data item X is made available to all other transactions. 


Types of Locks 


The two types of Locks are given below 


Binary Locks 
A binary lock has two states 
Locked/Unlocked 


If a database item X is locked, then X cannot be accessed by any other 
database operation. 


Shared/ Exclusive Locks (Read/Write Locks) 

There are three states 

Read locked/Write locked/Unlocked 

Several transactions can access the same item X for reading (shared lock), 


however if any of the transactions want to write the item X, the transaction 
must acquire an exclusive lock on the item. 


Note Using binary locks or read/write locks in transactions, does not 
guarantee serializability of schedules on its own. 


Rey POINTS aaa 


* Lock and unlock are atomic operations (all or nothing). 
+ If every transaction in a schedule follows the two-phase locking protocol, the 
schedule is guaranteed to be serializable. 
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Guaranteeing Serializability by 

Two-phase Locking 

A transaction is said to follow the two-phase locking protocol, if all locking 
operations precede the first unlock operation in the transaction. 

Such a transaction can be divided into two phases 

Expanding or Growing Phase During which new locks on items can be 
acquired but none can be released. 


Shrinking Phase During which existing locks can be released but no new 
locks can be acquired. 


Variations of Two-phase Locking (2PL) Technique 
Basic 2PL (2 Phase Locking) Transaction locks data items 
incrementally. This may cause the problem of deadlock. 

Conservative 2PL (Static 2PL) | Prevents deadlock by locking all desired 
data items before transaction begins execution. However, it is difficult to 
use in practice because of the need to predeclare the read-set and 
write-set, which is not possible in most situations. 

Strict 2PL A more stricter version of basic algorithm, where unlocking of 
write lock is performed after a transaction terminates (commits or aborts 
and rolled back). Hence, no other transaction can read or write an item that 
is written by transaction. 

Rigorous 2PL Strict 2PL is not deadlock free. A more restrictive variation of 
strict 2PL, is rigorous 2PL, which also guarantees strict schedule. In this 
variation, a transaction does not release any of its locks (exclusive or 
shared) until after it commits or aborts. 


Deadlock 


A deadlock is a condition in which two or move transaction are waiting for 
each other deadlock (7, and T). An example is given below 
-— es kf goo 
Read-lock (Y) 
R(Y) 





Read-lock (X) 
R(X) 
Write-Lock (X) 
(Waits for X) 
Write-Lock (Y) 
( Waits for Y) 
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Deadlock Prevention 


A transaction locks all data items, it refers to before it begins execution. 
This way of locking prevents deadlock, since a transaction never waits for a 
data item. The conservative two-phase locking uses this approach. 


Deadlock Detection and Resolution 


In this approach, deadlocks are allowed to happen. The scheduler 
maintains a wait-for-graph for detecting cycle. If a cycle exists, then one 
transaction involved in the cycle is selected (victim) and rolled back. 


-Key Punts = 
+ A wait-for-graph is created using the lock table. As soon as a transaction is 
blocked, it is added to the graph. 
+ When a chain like T; waits for T;, T; waits for Ty and T, waits for T, or T; 
occurs, then this creates a cycle. One of the transaction of the cycle is 
selected and rolled back. 


Deadlock Avoidance 


There are many variations of two-phase locking algorithm. Some avoid 
deadlock by not lefting the cycle to complete. This is as soon as the 
algorithm discovers that blocking a transaction is likely to create a cycle, it 
rolls back the transaction. 


Following schemes use transaction times-tamps for the sake of deadlock 
avoidance 


Wait-die scheme (Non-preemptive) 

Older transaction may wait for younger one to release data item. Younger 
transactions never wait for older ones; they are rolled back instead. A 
transaction may die several times before acquiring the needed data item. 


Wound-wait scheme (Preemptive) 

Older transaction bounds (forces rollback) of younger transaction instead 
of waiting for it. Younger transactions may wait for older ones. May be 
fewer rollbacks than wait-die scheme. 


Starvation 


Starvation occurs when a particular transaction consistently waits or 
restarted and never gets a chance to proceed further. In a deadlock 
resolution scheme, it is possible that the same transaction may 
consistently be selected as victim and rolled back. 
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Time-stamp Based Concurrency Control 
Algorithm 
This is a different approach that guarantees serializability involves using 


transaction time-stamps to order transaction execution for an equivalent 
serial schedule. 


Time-stamp 


A time-stamp is a unique identifier created by the DBMS to identify a 
transaction. 


-Key Points — 


+ Time-stamp is a monotonically increasing variable (integer) indicating the age 
of an operation or a transaction. 
+ A larger time-stamp value indicates a more recent event or operation. 


Starvation versus Deadlock 


Starvation 


Deadlock 





Starvation happens if same transaction 
is always choosen as victim. 


A deadlock is a condition in which two or 
more transaction are waiting for each other. 





It occurs if the waiting scheme for 
locked items is unfair, giving priority 
to some transactions over other. 


A situation where two or more transactions 
are unable to proceed because each is waiting 
for one of the other to do something. 





Starvation is also known as lived lock. 


Deadlock is also known as circular waiting. 





Avoidance 


Swithc priorities so that every threat 
has a chance to have high priority. 


Avoidance 


Acquire locks are predefined order. 





Use FIFO order among competing 
request. 


Acquire locks at one before starting. 





It means that transaction goes in a state 
where transaction never progress. 





It is a situation where transaction are waiting 
for each other. 


The algorithm associates with each database item X with two Time Stamp 


(TS) values 


Read_TS (X) 


The read time stamp of item X; this is the largest time-stamp among all the 
time-stamps of transactions that have successfully read item X. 
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Write_TS (X) 
The write time-stamp of item X ; this is the largest time-stamp among all the 
time-stamps of transactions that have successfully written item X. 


There are two Time-Stamp based Concurrency Control Algorithm 


Basic Time-stamp Ordering (TO) 


Whenever some transaction T tries to issue a R(X) or aW(X) operation, the 
basic time-stamp ordering algorithm compares the time-stamp of T with 
read TS (X) and write TS (X) to ensure that the time-stamp order of 
transaction execution is not violated. If this order is violated, then 
transaction T is aborted. If T is aborted and rolled back, any transaction 7, 
that may have used a value written by T must also be rolled back. Similarly, 
any transaction T, that may have used a value written by 7, must also be 
rolled back and so on. This effect is known as cascading rollback. 

The concurrency control algorithm must check whether conflicting 
operations violate the time-stamp ordering in the following two cases 


Transaction T issues a W (X) operation 

If read_TS (X) > TS (T) or if write TS (X) > TS (T), then an younger 
transaction has already read the data item, so abort and roll back T and 
reject the operation. 

If the condition in above part does not exist, then execute W(X) of T and set 
write_TS (X) to TS (7). 


Transaction T issues a R (X) operation 

If write_TS (X) > TS (T), then an younger transaction has already written to 
the data item, so abort and rollback T and reject the operation. 

If write_TS (X) < TS (T), then execute R(X) of T and set read_TS (T) to the 
larger of TS (T) and the current read_TS (X). 


Strict Time-stamp Ordering (TO) 

A variation of basic time-stamp ordering called strict time-stamp ordering, 
ensures that the schedules are both strict (for easy recoverability) and 
serializable (conflict). 

Transaction T issues a W(X) operation If TS (T) > read_TS (X), then 
delay 7 until the transaction T’ that wrote or read X has terminated 
(committed or aborted). 

Transaction T issues a R(X) operation If TS (T) > write_TS (X), then 
delay 7 until the transaction T’ that wrote or read X has terminated 
(committed or aborted). 
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Thomas’s Write Rule 
= If read_TS (X) > TS (T), then abort and rollback T and reject the operation. 
= If write_TS (X) > TS (T), then just ignore the write operation and continue execution. 
This is because the most recent writes counts in case of two consecutive writes. 


If the conditions given in (i) and (ii) above do not occur, then execute W(X) of T and 
set write_TS (X) to TS (T). 


Multiversion Concurrency Control Techniques 


This approach maintains a number of versions of a data item and allocates 
the right version to a read operation of a transaction. Thus, unlke other 
mechanisms a read operation in this mechanism is never rejected. 


Side effect Significantly more storage is required to maintain multiple versions. 


Validation Based Concurrency Control Protocol 
In this, execution of transaction T; is done in three phases. 

Read and Execution Phase Transaction 7; writes only to temporary local 
variables. 

Validation Phase Transaction 7; performs a validation test to determine, if 
local variables can be written without violating serializability. 


Write Phase If 7; is validated, the updates are applied to the database, 
otherwise 7; is rolled back. 


Note The above three phases of concurrently executing transactions can be 
interleaved, but each transaction must go through the three phases in 
that order. 


Each transaction T; has three time-stamps 

Start (T;) The time when 7; started its execution. 

Validation (7;) The time when 7; entered its validation phase. 

Finish (7;) The time when 7; finished its write phase. 

Serializability order is determined by time-stamp given at validation time to 
increase concurrency. Thus, TS (7;) is given the value of validation (7;). 

For all 7; with TS (7;) < TS (7;) either one of the following conditions holds 

(i) Finish (7;) < Start (7;) (ii) Start (7;) < Finish (7;) < Validation (7;) 
and the set of data items written by 7; does not intersect with the set of data 
items read by 7;. Then, validation succeeds and 7; can be committed. 
Otherwise, validation fails and 7; is aborted. 
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Multiple Granularity 


Allow data items to be of various sizes and define a hierarchy of data 
granularities, where the small granularities are nested within larger ones. It 
can be represented graphically as a tree. When a transaction locks a node 
in the tree explicitly, it implicitly locks all the nodes descendents in the 
same node. 


Coarse Granularity The larger the data item size, the lower the degree of 
concurrency. 


Fine Granularity The smaller the data item size, the more locks to be 
managed and stored and the more lock/unlock operations needed. 


-Key Points 


* Multiple users can access databases and use computer system simultaneously 
because of the concept of multiprogramming, which allows the comptuer to 
execute multiple programs or processes at the same time. 


+ If the computer system has multiple hardware processors (CPUs), parallel 
processing of multiple processes is possible. 

+ While one transaction is waiting for a page to be read from disk, the CPU can 
process another transaction. This may increase system throughput (the 
average number of transactions completed in a given time). 

+ Interleaved execution of a short transaction with a long transaction allows the 
short transaction to complete first. This gives improved response time 
(average time taken to complete a transaction). 

+ A transaction may be incomplete because the (database) system crashes or 
because it is aborted by either the system or the user (or application). 

+ Complete transactions are committed. 


+ Consistency and isolation are primarily the responsibility of a scheduler. 
Atomicity and durability are primarily responsibility of recovery manager. 


Interleaving 


Multiprogramming operating systems execute some commands from one 
process, then suspend that process and execute some commands from 
the next process and so on. A process is resumed at the point, where it 
was suspended, whenever it gets its turn to use the CPU again. Hence, this 
process is actually known as interleaving. 


File Structures 


File structure or organisation refers to the relationship of the key of the 
record to the physical location of that record in the computer file. 


Disk Storage 
Databases must be stored physically as files of records, which are typically 
stored on some computer storage medium. The DBMS software can then 
retrieve, update and process this data as needed. 
Several aspects of storage media must be taken into account 

(i) Speed with which data can be accessed. 

(ii) Cost per unit of data 

(iii) Reliability 
e Data loss on power failure or system crash. 
e Physical failure of the storage device. 
e So, we can differentiate storage into 


Volatile Storage 
Losses contents when power is switched off. 


Non-volatile Storage 
Contents persist even when power is switched off. 


Category of Computer Storage Media 
Computer storage media form a storage hierarchy that includes two main categories. 


i 
I 
i 
l 
| = Primary Storage This category includes storage media that can be operated on 
| directly by the computer Central Processing Unit (CPU), such as the computer main 
| memory and smaller but faster cache memories. Primary storage usually provides 
! fast access to data but is of limited storage capacity. 

| = Secondary Storage This category includes magnetic disks, optical disks and tapes. 
| These devices usually have a larger capacity, cost less and provides slower access 
' to data than do primary storage devices. Data in secondary storage cannot be 
! processed directly by the CPU. 


Characteristics of Secondary Storage Devices 


(i) Random access versus sequential access 
(ii) Read-write, write-once, read-only 
(iii) Character versus block data access 
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Storage of Databases 


Need data to be stored permanently or persistently over long periods to time. 
Cost of storage per unit of data is an order of magnitude, less for disk 
secondary storage than for primary storage. 


Magnetic Hard Disk Mechanism 

Magnetic disks are used for storing large amount of data. The most basic 
unit of data on the disk is a single bit of information. All disks are made of 
magnetic material shaped as a thin circular disk and protected by a plastic 
or acylic cover. A disk is single sided, if it stores information on one of its 
surfaces only and double-sided, if both surfaces are used. 


Key POINTS ann 


+ To increase storage capacity, disks are assembled into a disk pack, which 
may include many disks. 


+ Information is stored on disk surface in concentric circles, each circle is 
called a track. 


+ The disk surfaces are called platter surface. 


Read-write Head 
e Positioned very close to the platter surface (touching it). 
e Reads or writes magnetically encoded information. 


Each Track is Divided into Sectors 

e A sector is the smallest unit of data that can be read or written. 

e Sector size typically 512 bytes. 

e Typical sectors per track : 500 (on inner tracks) to 1000 (on outer tracks). 


Arm 
assembly 


Read-write 
head 





Platter 











Rotation “7 
Magnetic hard disk mechanism 
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Performance Measures on Disks 
Access Time The time it takes from when a read or write request is issued 
to when data transfer begins. /t consists of 


e Seek time Time it takes to reposition the arm over the correct track. 
Average seek time is 1/2 of the worst case seek time. 

e Rotational latency Time it takes for the sector to be accessed to appear 
under the head. Average latency is 1/2 of the worst case latency. 

Data Transfer Rate The rate at which the data can be retrieved from or 

stored to the disk. 


Mean Time To Failure (MTTF) The average time, the disk is expected to 
run continuously without any failure. 


Optimization of Disk-Block Access 

Block A contiguous sequence of sectors from a single track. Data is 
transferred between disk and main memory in blocks. Sizes range from 
512 bytes to several kilobytes. Blocks are of two types 

e Smaller blocks more transfers from disk. 

e Larger blocks more space wasted due to partially filled blocks. 
Disk-arm Scheduling This algorithms order pending accesses to tracks 
so that disk arm movement is minimized. 


Elevator Algorithm In this algorithm, move disk arm in one direction (from 
outer to inner tracks or vice-versa), processing next request in that 
direction, till no more request in that direction, then reverse direction and 
repeat. 


RAID (Redundant Arrays of Independent Disks) 
The choice of disk structure is very important in databases. Important 
factors, besides price are 

(i) Capacity (ii) Speed (iii) Reliability 
It is a disk organisations technique that manage a large numbers of disks, 
providing a view of a single disk of 
e High capacity and high speed by using multiple disks in parallel and 


e High reliability by storing data redundantly so that data can be recovered 
even if a disk fails. 
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Storage Access in Databases 


A database file is partitioned into fixed length storage units called blocks. 
Blocks are units of both storage allocation and data transfer. Database 
system seeks to minimize the number of block transfers between the disk 
and memory. The number of disk accesses can be reduced by keeping as 
many blocks as possible in main memory. 


Buffer 


It is a portion of main memory that is available to store copies of disk 
blocks when several blocks need to be transferred from disk to main 
memory and all the block addresses are known, several buffers can be 
reserved in main memory to speed up the transfer while one buffer is being 
read or written, the CPU can process data in the other buffer. 


Buffer Manager 
It is a subsystem which is responsible for allocating buffer space in main 
memory. 

Programs call on the buffer manager when they need a block from disk. 


(i) If the block is already in the buffer, buffer manager returns the 
address of the block in main memory. 


(ii) If the block is not in the buffer, the buffer manager 


Space in the Buffer 
Allocates space in the buffer for the block. 
= Replacing (throwing out) some other block, if required to make 
space for the new block. 
= Replaced the block written back to disk only if it was modified, 
since the most recent time that it was written to/fetched from the 
disk. 


Reads the block from the disk to the buffer and returns the address of the 
block in main memory to requester. 

Buffer Replacement Policies Most operating systems replace the block 
that is Least Recently Used (LRU strategy). 

Most Recently Used (MRU) Strategy System must pin the blocks currently 
being processed. After the final tuple of that block has been processed, the 
block is unpinned and it becomes the most recently used blocks. 


Pinned Block Memory block that is not allowed to be written back to disk. 


File Organisation 


The database is stored as a collection of files. Each file is a sequence of 
records. A record is a sequence of fields. Data is usually stored in the form 
of records. Records usually describe entities and their attributes. e.g., an 
employee record represents an employee entity and each field value in the 
record specifies some attributes of that employee, such as Name, 
Birth-date, Salary or Supervisor. 


If every record in the file has exactly the same size (in bytes), the file is said 
to be made up of fixed length records. If different records in the file have 
different sizes, the file is said to be made up of variable length records. 


Spanned versus Unspanned Records 


= The records of a file must be allocated to disk blocks because a block is the unit of 
data transfer between disk and memory. When the block size is larger than the 
record size, each block will contain numerous records, although some of the files 
may have unusually large records that cannot fit in one block. 


= Suppose that block size is B bytes. For a file of fixed length records of size R bytes, 


with B = R , then we can fit bfr = | records per block. The value of bfr is called 


the blocking factor. 


= In general, R may not divide B exactly, so we have some unused space in each 
block equal to B — (bfr * R) bytes. 


= To utilize this unused space, we can store part of a record on one block and the rest 
on another. A pointer at the end of the first block points to the block containing the 
remainder of the record. This organisation is called spanned. 


= If records are not allowed to cross block boundaries, the organisation is called 
unspanned. 


Allocating File Blocks on Disk 
There are several standard techniques for allocating the blocks of a file on disk 


Contiguous Allocation The file blocks are allocated to consecutive disk 
blocks. This makes reading the whole file very fast. 


Linked Allocation In this, each file contains a pointer to the next file block. 


Indexed Allocation Where one or more index blocks contain pointers to 
the actual file blocks. 


File Headers 


A file header or file descriptor contains information about a file that is 
needed by the system programs that access the file records. 
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Operations on Files 
Operations on files are given below 


Retrieval Operations These do not change any data in the files but only 
locate certain records so that their field values can be examined and 
processed. 


Update Operations These change the file by insertion or deletion of 
records or by modification of field values. 


Open Prepares the file for reading or writing. Set the file pointer to the 
beginning of the file. 


Reset Sets the file pointer of an open file to the beginning of the file. 


Find (or Locate) Searches for the first record that satisfies a search 
condition. Transfers the block containing that record into a main memory 
buffer (if it is not already there). 


Read (or less Get) Copies the current record from the buffer to a program 
variable in the user program. 


FindNext Searches for the next record in the file that satisfies the search 
condition. 


Delete Deletes the current record and updates the file on disk to reflect the 
deletion. 


Modify Modifies some field values for the current record and updates the 
file on disk to reflect the modification. 


Insert Insert a new record in the file by locating the block, where the record 
is to be inserted, transferring that block into the main memory buffer, 
writing the record into the buffer and writing the buffer to disk to reflect the 
insertion. 


Close Completes the file access by releasing the buffers and performing 
any other needed cleanup operations. 


Files of Unordered Records (Heap Files) 


In the simplest type of organization records are placed in the file in the 
order in which they are inserted, so new records are inserted at the end of 
the file. Such an organisation is called a heap or pile file. 


This organisation is often used with additional access paths, such as the 
secondary indexes. 


In this type of organisation, inserting a new record is very efficient. Linear 
search is used to search a record. 
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Files of Ordered Records (Sorted Files) 


e We can physically order the records of a file on disk based on the values 
of one of their fields called the ordering field. This leads to an ordered or 
sequential file. 

e Ifthe ordering field is also a key field of the file, a field guaranteed to have 
a unique value in each record, then the field is called the ordering key for 
the file. Binary searching is used to search a record. 


Hashing Techniques 


Another type of primary file organisation is based on hashing which 
provides very fast access to records under certain search conditions. This 
organisation is usually called a hash file. A hash file has also been called a 
direct file. 


-Key Points erreme 


+ The search condition must be an equality condition on a single field, called 
the hash field. 
| Hash field = Key field | => | Hash key 
+ Hash function is used to determine page address for storing records. This is 
chosen to provide most even distribution of records and tends to minimum 
collisions. 








There are various techniques for hashing 


Internal Hashing 


For internal files, hashing is typically implemented as a hash table through 
the use of an array of records and the array index range is from 0 to M — 1, 
then we have M slots in an array. 


One common hash function is 
h(K) = K mod M 


Which returns the remainder of an integer hash field value K after 
division by M. 


Collision 


Hash function does not calculate unique address for two or more records. 
Here, a collision occurs when the hash field value of a record that is being 
inserted hashes to an address that already contains a different record. In 
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this situation, we must insert the new record in some other position, since 
its hash address is occupied. The process of finding another position is 
called collision resolutions. 


There are number of methods for collision resolution 





Collision resolution 
techniques 


a 


Chaining Open addressing 


Linear Quadratic Double 
probing probing probing 


Classification of collision resolution 






































Chaining 
In this method, all the elements, where keys hash to the same hash table 
slot are put in linked list manner. 
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Collision Resolution by Open Addressing 

In open addressing, the keys to be hashed is to put in the separate location 

of the hash table. Each location contains some key or the some other 

character to indicate that the particular location is free. 

In this method to insert key into the table, we simply hash the key using the 

hash function. If the space is available, then insert the key into the hash 

table location, otherwise search the location in the forward direction of the 

table to find the slot in a systematic manner. The process of finding the slot 

in the hash table is called probing. 

Linear probing The linear probing use the following hash function 
h(k,/) =[h’(k) +i ]mod m for/ = 0,12,...m—-1, 

where mis the size of the hash table, h’(k) = k mod m , the basic function, 

and į = The probe number. 

Quadratic probing The quadratic probing uses the following hash function 


h (k, i) =[h’(k) + C4i + Cai?] mod m for i = 0, 1...m-1 
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where, m = size of hash table 
h’(k) = k mod m, the basic hash function 
c,andc, + 0 are auxiliary and / = the probe number. 
Double hashing In doble hashing, second hashing function H’ is used for 
resolving a collision. Suppose a record R with key K has the hash address 
H(K) =handH’(K)=h’#m 
Then, we linearly search the location with addresses 
h,h+h’,h+2h’,h+ 3h’... 


If m is prime number, then the above sequence will access all the 
locations in the table T. 


External Hashing for Disk Files 


Hashing for disk files is called external hasing. To suit the characteristics of 
disk storage, the target address space is made of buckets, each of which 
holds multiple records. A bucket is either one disk block or a cluster of 
contiguous blocks. The hashing function maps a key into a relative bucket 
number, rather than assigning an absolute block address to the bucket. A 
table maintained in the file header converts the bucket number into the 
corresponding disk block address. The collisions problem is less severe 
with buckets. 


Indexing Structures for Files 

Indexing mechanism are used to optimize certain accesses to data 
(records) managed in files. e.g., the author catalog in a library is a type of 
index. Search key (definition) attribute or combination of attributes used to 
look-up records in a file. 


An index file consists of records (called index entries) of the form. 


Search key value | Pointer of block in data file 


Indexing structures for files 














Index files are typically much smaller than the original file because only the 
values for search key and pointer are stored. The most prevalent types of 
indexes are based on ordered files (single-level indexes) and tree data 
structures (multilevel indexes). 


Types of Single Level Ordered Indexes 


In an ordered index file, index enteries are stored sorted by the search key 
value. There are several types of ordered Indexes 
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Primary Index 

A primary index is an ordered file whose records are of fixed length with two 
fields. The first field is of the same data type as the ordering key field called 
the primary key of the data file and the second field is a pointer to a disk 
block (a block address). 


-Key POINTS iirinn 


+ There is one index entry in the index file for each block in the data file. 

+ Indexes can also be characterised as dense or sparse. 

+ Dense index A dense index has an index entry for every search key value in 
the data file. 

+ Sparse index A sparse index (non-dense), on the other hand has index entries 
for only some of the search values. 

+ A primary index is a non-dense (sparse) index, since it includes an entry for 
each disk block of the data file rather than for every search value. 


Clustering Index 

If file records are physically ordered on a non-key field which does not have 
a distinct value for each record that field is called the clustering field. We 
can create a different type of index, called a clustering index, to speed up 
retrieval of records that have the same value for the clustering field. 


-Key POINTS iirinn 
+ A clustering index is also an ordered file with two fields. The first field is of 
the same type as the clustering field of the data file. 
+ The record field in the clustering index is a block pointer. 
+ A clustering index is another example of a non-dense index. 


Secondary Index 

A secondary index provides a secondary means of accessing a file for 
which some primary access already exists. The secondary index may be 
on a field which is a candidate key and has a unique value in every record 
or a non-key with duplicate values. The index is an ordered file with two 
fields. The first field is of the same data type as some non-ordering field of 
the data file that is an indexing field. The second field is either a block 
pointer or a record pointer. 


A secondary index usually needs more storage space and longer search 
time than does a primary index. 
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Multilevel Indexes 

The idea behind a multilevel index is to reduce the part of the index. A 
multilevel index considers the index file, which will be referred now as the 
first (or base) level of a multilevel index. Therefore, we can create a primary 
index for the first level; this index to the first level is called the second level 
of the multilevel index and so on. 


Dynamic Multilevel Indexes Using 
B-Trees and B* -Trees 


There are two multilevel indexes 


B-Trees 

e When data volume is large and does not fit in memory, an extension of the 
binary search tree to disk based environment is the B-tree. 

e In fact, since the B-tree is always balanced (all leaf nodes appear at the 
same level), it is an extension of the balanced binary search tree. 

e The problem which the B-tree aims to solve is given a large collection of 
objects, each having a key and a value, design a disk based index 
structure which efficiently supports query and update. 

e A B-tree of order p, when used as an access structure on a key field to 
search for records in a data file, can be defined as follows 

(i) Each internal node in the B-tree is of the form 

< R, < Ka, Fa > Pos < Ko, Pn Pr < KG nF, ah 

where, g < p. 
Each P is a tree pointer to another node in the B-tree. 
Each F, is a data pointer to the record whose search key field value 
is equal to K;. 

(ii) Within each node, K4 < Ky <... < Kg -+ 

(iii) Each node has at most p tree pointers. 

(iv) Each node, except the root and leaf nodes, has atleast | (p/2) | tree 
pointers. 

(v) A node within q tree pointers q < p, has q — 1search key field values 
(and hence has q — 1data pointers). 


e.g., A B-tree of order p = 3. The values were inserted in the order 8, 5, 
1,7, 3, 12, 9, 6. 
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B* Trees 

e Itis the variation of the B-tree data structure. 

e In a B-tree, every value of the search field appears once at some level in 
the tree, along with a data pointer. In aB*-tree, data pointers are stored 
only at the leaf nodes of the tree. Hence, the structure of the leaf nodes 
differs from the structure of internal nodes. 


e The pointers in the internal nodes are tree pointers to blocks that are tree 
nodes whereas the pointers in leaf nodes are data pointers. 


Bt Tree’s Structure 
The structure of the B* -tree of order p is as follows 
p 


1 

1 

1 

1 

J 

; = Each internal node is of the form < P,, Ky, Py, Ko; e., Pa Kg-1 Pg > 

Where, q < p and each P, is a tree pointer. 

i = Within each internal node, Kı < K3 <K3...<Kg-+ 

i 

! = Each internal node has at most p tree pointers and except root, has atleast | (p/2) ] 
| tree pointers. The root node has atleast two tree pointers, if it is an internal node. 

! a Each leaf node is of the form. 

i << Ky, Pa >) < Ka Po Srog Kyat Pa Pee > 

ı Where, q < p, each P, is a data pointer and Prey points to the next leaf node of the 
i i 

ı B*-trees. 


Operating System 


Operating System 


An operating system acts as an intermediary between the user of a 
computer and the computer hardware. An Operating System (OS) is a 
software that manages the computer hardware. 
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Peripheral interfacing in operating system 


Components of a Computer System 

Hardware It provides the basic computing resources for the system. It 
consists of CPU, memory and the input/output (I/O) devices. 

Application Programs Define the ways in which these resources are used 
to solve user’s computing problems. e.g., word processors, spreadsheets, 
compilers and web browsers. 
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Functions of Operating System 


Operating system provides many functions for ensuring the efficient 
operation of the system itself. Some functions are listed below 


Resource Allocation Allocation of resources to the various processes is 
managed by operating system. 


Accounting Operating system may also be used to keep track of the 
various computer resources and how much and which users are using 
these resources. 


Protection Protection ensures that all access to the system resources is 
controlled. When several processes execute concurrently, it should not be 
possible for one process to interface with the other or with the operating 
system itself. 


Operating System Services 
Many services are provided by OS to the user’s programs. 
Some of the OS services are listed below 


Program Execution The operating system helps to load a program into 
memory and run it. 


I/O Operations Each running program may request for I/O operation and 
for efficiency and protection the users cannot control I/O devices directly. 
Thus, the operating system must provide some means to do |/O 
operations. 


File System Manipulation Files are the most important part which is 
needed by programs to read and write the files and files may also be 
created and deleted by names or by the programs. The operating system 
is responsible for the file management. 


Communications Many times, one process needs to exchange 
information with another process, this exchange of information can takes 
place between the processes executing on the same computer or the 
exchange of information may occur between the process executing on the 
different computer systems, tied together by a computer network. All these 
things are taken care by operating system. 


Error Detection It is necessary that the operating system must be aware 
of possible errors and should take the appropriate action to ensure correct 
and consistent computing. 
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Batch Systems 


Early computers were the large machines that run from a console. The 
users of batch systems did not interact directly with the computer systems. 
Rather, the user prepared a job which consists of programs, data and 
control information and then submit it to the computer operator. The job 
prepared would be in the form of punch cards. After sometime perhaps 
minutes, hours or days, the output is prepared. 


-Key Points 26 
+ The main drawback of batch system is the lack of interaction between the 
user and the job while it was executing. 
+ Multiprogramming creates logical parallelism. 


Multiprogramming 


It is the technique of running several programs at a time using time sharing. 
It allows a computer to do several things at the same time. 

The concept of multiprogramming is that the OS keeps several jobs in 
memory simultaneously. The operating system selects a job from the job 
pool and starts executing a job, when that job needs to wait for any 
input/output operations, the CPU is switched to another job. So, the main 
idea here is that the CPU is never idle. 


Multitasking 


Multitasking is the logical extension of multiprogramming. The concept of 
multitasking is quite similar to multiprogramming but difference is that the 
switching between jobs occurs so frequently that the users can interact 
with each program while it is running. 

This concept is also known as time sharing system. A time shared 
operating system uses CPU scheduling and multiprogramming to provide 
each user with a small portion of time shared system. 


Real Time Systems 


A real time operating system is used in environments, where a large 
number of events, mostly external to the computer system must be 
accepted and processed in short time or within certain deadlines. 


Real time systems are used when there are rigid time requirements on the 
flow of data or the operation of processor and therefore these are used as 
a control device in a dedicated application. 
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Distributed Systems 


In distributed systems, the computation is distributed among several 
processors. Each processor in distributed system has its own local 
memory and do not share memory or a clock. 


A distributed operating system governs the operation of a 
distributed computer system and provides a virtual machine abstraction to 
its users. 


-Key POINTS iirinn 


+ A pool of job on disk allows the OS to select which job to run next, to 
increase CPU utilization 

+ If jobs come directly on cards (or on magnetic tape), they run sequentially on 
FCFS basis but when several jobs are on direct access devices like disk, job 
scheduling is possible. 

+ The most important aspect of job scheduling is the ability to 
multiprogramming. 

+ If several jobs are ready to be brought into memory, and there is not enough 
room for all of them, then the system must choose among them, making this 
decision is job scheduling. 

+ When the OS selects a job from job pool, it loads that job into memory. This 
cause the residence of several programs in memory and calls for memory 
management scheme. 

+ If several jobs are ready to run at same time, the system must choose among 
them. This decision is called CPU scheduling. 


Threads 


A thread is a basic unit of CPU utilisation. A thread comprises a 
1- dimensional program counter , a register set and a stack. It shares with 
other threads belonging to the same process its code section, data section 
and other system resourses such as open files and signals . A traditional 
procoss has a single thread of control. If a process has multiple thread of 
control, it can perform more than one task at a time. 


Multithreading 


An application typically is implemented as a separate process with several 
threads of control. In some situations, a single application may be required 
to perform several similar tasks. e.g., a web server accepts client request 
for web pages, images, sound and so on. A busy web server may have 
several of clients concurrently accessing it. If the web server run as a 
traditional single threaded process, it would be able to service only one 
client at a time. 


The amount of time that a client might have to wait for its request to be 
serviced could be enormous. So, it is efficient to have one process that 
contains multiple threads to serve the same purpose. 


This approach would multithreaded the web server process, the server 
would create a separate thread that would listen for client requests, when a 
request was made rather than creating another process, it would create 
another thread to service the request. 


Multithreading Model 


There are two types of threads 
(i) User threads (ii) Kernel threads 


Kernel Threads 
Kernel threads are supported and managed directly by the operating system. 


User Threads 

They are above the kernel and they are managed without kernel support. 
There are three common ways of establishing relationship between user 
threads and kernel threads 


(a) Many-to-many model (b) One-to-one model 
(c) Many-to-one model 
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e One-to-one model maps each user thread to corresponding kernel 
threads. 


e Many-to-many model multiplexes many user threads to a smaller or 
equal number of kernel threads. 


e Many-to-one model maps many user threads to single kernel threads. 


-Key POINGS seeni ss 


+ User level threads are threads that are visible to the programmer and 
unknown to the Kernel. 


+ User level threads are faster to create and manage than that of Kernel threads. 


Process 


A process is a program in execution. A process is more than the program 
code /.e., text section. It also includes the current activity as represented by 
the value of the program counter and the contents of the processor's register. 


Max 





Stack |— Contains temporary data (such as function parameters, return 
addresses and local variables). 


4 
T 
Heap — Memory that is dynamically allocated during process run time. 
Data |— Data section contains global variables. 
Text |— Code section 











A simple process block 


Process in Memory 


Each process is represented in the OS by a Process Control Block (PCB) 
also called a task control block. 







Terminated 





Admitted Interrupt 





Scheduler dispatch 
I/O or event 


wait 


I/O or event 
completion 


Diagram of a process state 
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As processes enter the system, they are put 
into a job queue, which consists of all 
processes in the system. 


Process state 
Process number 
Program counter 














The processes that are residing in main Registers 
memory and are ready and waiting to execute Memory limits 














are kept on a list called the ready queue. List of open files 





Process control block 


-Key POINTS iiini 
+ The list of processes waiting for a particular I/O device is called a device 
queue. Each device has it’s own device queue. 
+ I/O bound process is one that spends more of its time doing I/O rather than it 
spends doing computations. 
+ ACPU bound process is one which uses more of its time doing computations 
rather than it spends doing I/O activities. 


Schedulers 


A process migrates among various scheduling queues throughout its 
lifetime. The OS must select for scheduling purposes, processes from 
these queues in some fashion. The selection process is carried out by the 
appropriate scheduler. 


Long Term and Short Term Schedulers 

e Along term scheduler or job scheduler selects processes from job pool 
(mass storage device, where processes are kept for later execution) and 
loads them into memory for execution. 

e Ashort term scheduler or CPU scheduler, selects from the main memory 
among the processes that are ready to execute and allocates the CPU to 
one of them. 

e The long term scheduler controls the degree of multiprogramming (the 
number of processes in memory). 


Note Mid-term scheduler uses the concept of swapping. 


Dispatcher 
It is the module that gives control of the CPU to the process selected by the short term 
scheduler. This function involves the following 
= Switching context = Switching to user mode 
= Jumping to the proper location in the user program to restart that program. 


Scheduling Algorithm 


All of the processes which are ready to execute and are placed in main 
memory then selection of one of those processes is known as scheduling, 
and after selection that process gets the control of CPU. 


Scheduling Criteria 
The criteria for comparing CPU scheduling algorithms include the following 
CPU Utilization Means keeping the CPU as busy as possible. 


Throughput It is nothing but the measure of work i.e., the number of 
processes that are completed per time unit. 


Turnaround Time The interval from the time of submission of a process to 
the time of completion. It is the sum of the periods spent waiting to get into 
memory, waiting in the ready queue, executing on the CPU and doing |/O. 


Waiting Time The sum of the periods spent waiting in the ready queue. 


Response Time The time from the submission of a request until the first 
response is produced. 


-Key POINTS iirinn 
+ The amount of time, a process takes to start responding not the time it takes to 
output the response is called response time. 


+ Response time is desirable to maximize CPU utilization and throughput and 
to minimize turnaround time, waiting time and response time. 


There are many CPU scheduling algorithms as given below 


First Come First Served (FcFs) Scheduling 


With this scheme, the process that requests the CPU first is allocated the 
CPU first. The implementation of the FCFS policy is easily managed with 
FIFO queue. When a process enters the ready queue, its PCB (Process 
Control Block) is linked onto the tail of the queue. When the CPU is free, it is 
allocated to the process at the head of the queue. The running process is 
then removed from the queue. 
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Gantt chart 


Operating System 


Process Table 





























P, P, P, Process Burst Time (in milliseconds) 
0 24 27 30 P, 24 

Waiting time for P, = 0 ms P, 3 

Waiting time for P, =24 ms P; 3 





Waiting time for P =27 ms Turn around time for P, =24 — 0 =24 ms 


Average waiting time Turn around time for P, =27 -24 = 3 ms 


Turn around time for P; = 30 -27 = 3 ms 


Shortest Job First (SJF) Scheduling 


When the CPU is available, it is assigned to the process that has the 
smallest next CPU burst. If the two processes have the same length or 
amount of next CPU burst, FCFS scheduling is used to break the tie. 


Process Table 











Gantt Chart 
Process ae P, Pi P L 
P, 6 0 3 9 16 24 
P, 8 Average waiting time 
P, 7 _3+16+9+0_7,,, 
P, 3 4 














waiting time for A, =3 

waiting time for P, =16 

waiting time for P; =9 

waiting time for P, =0 
A more appropriate term for this scheduling method would be the shortest 
next CPU burst algorithm because scheduling depends on the length of 
the next CPU burst of a process rather than its total length. 


Preemptive and Non-preemptive Algorithm 


The SJF algorithm can either be preemptive or non-preemptive. The choice 
arises when a new process arrives at the ready queue while a previous 
process is still executing. The next CPU burst of the newly arrived process 
may be shorter than what is left of the currently executing process. 
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A preemptive SJF algorithm will preempt the currently executing process, 
whereas a non-preemptive algorithm will allow the currently running 
process to finish its CPU burst. Preemptive SUF scheduling is sometimes 
called shortest-remaining-time-first-scheduling. 


Process Table 





Process Arrival Time Burst Time 
P 0 8 
P, 1 4 
P, 2 9 
P, 3 5 
Waiting time for P, = (10 - 1) = 
Waiting time for P> =(1-1) =0 
Waiting time for P =17 -2 =15 
Waiting time for P, =5-3=2 




















9 


























Gantt Chart 
P | R Rh |A | R 
0 1 5 10 17 26 
Average waiting time = sue =6.5ms 


Priority Scheduling 


A priority is associated with each process and the CPU is allocated to the 
process with the highest priority. Equal priority processes are scheduled in 
FCFS order. 


We can be provided that low numbers represent high priority or low 
numbers represent low priority. According to the question, we need to 
assume any one of the above. 


-Key Points ose ee 


+ Priority scheduling can be either preemptive or non-preemptive. 

+ A preemptive prioirty scheduling algorithm will preempt the CPU, if the 
priority of the newly arrived process is higher than the priority of the currently 
running process. 

+ A non-preemptive priority scheduling algorithm will simply put the new 
process at the head of the ready queue. 
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A major problem with priority scheduling algorithm is indefinite blocking or 
starvation. 


Process Table 











Process Burst Time Priority 
P, 10 3 
P» 1 1 
P; 2 4 
P; 1 5 
P; 5 2 














Waiting time for P, =6 

Waiting time for P> = 0 

Waiting time for P; =16 

Waiting time for P, =18 

Waiting time for P; =1 
6+0+16+18+1 


Average waiting time = 7 =82ms 


Gantt Chart 
R 
1 6 16 1 


4 
0 8 19 





Round Robin (RR) Scheduling 


The RR scheduling algorithm is designed especially for time sharing 
systems. It is similar to FCFS scheduling but preemption is added to switch 
between processes. A small unit of time called a time quantum or time 
slice is defined. 


The ready queue is treated as a circular queue. The CPU scheduler goes 
around the ready queue allocating the CPU to each process for a time 
interval of up to 1 time quantum. The process may have a CPU burst of less 
than 1 time quantum, in this case the process itself will release the CPU 
voluntarily. The scheduler will then proceed to next process in the ready 
queue. 


Otherwise, if the CPU burst of the currently running process is longer than 1 
time quantum, the time will go off and will cause an interrupt to the 
operating system. A context switch will be executed and the process will be 
put at the tail of the ready queue. The CPU scheduler will then select the 
next process in the ready queue. 
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Process Table 














5 Burst Time 
rocess (in milliseconds) 
P, 24 
P, 3 
P 3 
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Let’s take time quantum = 4 ms. Then the resulting RR schedule is as 


follows 

















P B P |P 








P |P P 





P; 








O 4 7- 10 





14 18 22 26 30 


P, waits for the 6 ms (10 — 4), P; waits for 4 ms and P} waits for 7 ms. 


Thus, 


Average waiting 


time = 


Multilevel Queue Scheduling 


This scheduling algorithm has been 
created for saturations in which 
processes are easily classified into 
different groups. e.g, a division 
between foreground (interactive) 
processes and background (batch) 
processes. A multilevel queue 
scheduling algorithm partitions the 
ready queue into several separate 
queues. The processes are 
permanently assigned to one queue, 
generally based on some property of 
the process, such as memory size, 
process priority or process type. 


Multilevel Feedback Queue Scheduling 


Highest 
priority 


6+4+7 


= 5.66 ms 





System processes 








Interactive processes 








Interactive editing processes 








pea 





Batch processes 


Student processes 





III] 





Lower 
priority 


Multilevel queue 


This scheduling algorithm allows a process to move between queues. The 
idea is to separate processes according to the characteristics of their CPU 


bursts. 


If a process uses too much CPU time, it will be moved to a lower priority 
queue. Similarly, a process that waits too long in a lower priority queue may 
be moved to a higher priority queue. This form of aging prevents starvation. 
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Synchronization 


When several processes access and manipulate the same data 
concurrently and the outcome of the execution depends on 
the particular order in which the access takes place is called race 
condition. 


e.g., Suppose we have two variables A and B. The operation on A and Bare 
as follows 


Operation 1 A = Result Operation 2 B = Result 
A=A+1 B=B-1 
Result = A Result =B 


Now, initially if value of result = 4 and sequence is operation 1, then 
operation 2. Then, 


A=4 

A=44+1=5 
Result =A=5 

B=Result = 5 

B=5-1=4 
Result =B=4 


Now, if the sequence of operation gets changed as is operation 2, then 
operation 1. Then, 


B =Result = 4 

B=B-1=4-1=3 
Result=B=3 

A = Result =3 

A=A+1=4 
Result = A =4 


For the race condition above, we need to ensure that only one process at a 
time can be manipulating the variable result. To make such a gaurantee, 
we require that the processes be synchronized is some way. 
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Inter-Process Communication (IPC) 


Processes executing concurrently in the operating system may be either 
independent or cooperating processes. A process is independent, if it 
can’t affect or be affected by the other processes executing in the system. 
Any process that shares data with other processes is a cooperating 
process. Advantages of process cooperations are information sharing, 
computation speed up, modularity and convenience to work on many task 
at the same time. Cooperating processes require an Inter-Process 
Communication (IPC) mechanism that will allow them to exchange data 
and information. 
There are two fundamental models of |PC 

(i) Shared memory (ii) Message passing 
In the shared memory model, a region of memory that is shared by 
cooperating process is established. Process can then exchange 
information by reading and writing data to the shared region. 


In the message passing model, communication takes place by means of 
messages exchanged between the cooperating processes. 



























































Process A |M Process A 
Process B [M]|<4 Shared 
2) 11 Process B 
Kernel |M|——— Kernel 
Message passing Shared memory 


The Critical Section Problem 


Consider a system containing of n processes {F,P,,...,P,_ 4}. Each 

process has a segment of code, called a critical section, in which the 

process may be changing common variables, updating a table, writing a 

file and so on. 

The important feature of the system is that, when one process is executing 

in its critical section, no other process is to be allowed to execute in its 

critical section. That is, no two processes are executing in their critical 
sections at the same time. 

e The critical section problem is to design a protocol that the processes can 
use to cooperate. Each process must request permission to enter its 
critical section. 

e The section of code implementing this request is the entry section. The 
critical section may be followed by an exit section. The remaining code is 
the remainder section. 
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General structure of a typical process P, is given below 
do { 











Entry Section 
Critcial Section 
Exit Section 
Remainder Section 
} while (True); 
A solution to critical section problem must satisfy the following three 
requirements 

















Mutual Exclusion 
If a process P is executing in its critical section, then no other processes 
can be executing in their critical sections. 


Progress 

If no process is executing in its critical section and some processes wish to 
enter their critical sections, then only those processes that are not executing 
in their remainder sections can participate in deciding which will enter its 
critical section next and this selection can’t be postponed indefinitely. 


Bounded Waiting 

There exists a bound or limit, on the number of times that other processes 
are allowed to enter their critical sections after a process has made a 
request to enter its critical section and before that request is granted. 


Semaphores 
Semaphore is nothing but a synchronization tool. A semaphore S is an 
integer variable that apart from initialization, is accessed only through two 
standard atomic operations wait ( ) and signal ( ). The wait ( ) operation 
does testing of the integer value of S(S <0) as well as its possible 
modification (S — -). 
wait (S) 
{ 
whileS <=0 
i; / / no operation 
S --; 
} 
The signal () operation does increment to the integer value of S (S + +). 
Signal (S) 
{ 


} 


S + 4} 
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All modifications to the integer value of semaphore in the wait () and signal 
() operations must be executed indivisibly. 


Usage of Semaphores 


e We can use binary semaphores to deal with the critical section problem 
for multiple processes. The n processes share a semaphore, mutex, 
initialized to 1. 

e The value of binary semaphore can range only between 0 and 1. 

e Binary semaphores are known as mutex locks as they are locks that 
provide mutual exclusion. 

Mutual exclusion implementation with semaphores 
do 

{ 


wait (mutex); 
// Critical section 
signal (mutex); 
// Remainder section 
} while (TRUE); 


Key Pints n-ne 


+ Counting semaphores can be used to control access to a given resource 
consisting of a finite number of instances. 


+ The value of a counting semaphore can range over an unrestricted domain. 


Classical Problem of Process Synchronization 
1. Producer-consumer problem 2. Readers-writer problem 
3. Dining-philosopher problem 


Producer-Consumer Problem 
In this problem, there are two processes 
(i) producer process (ii) consumer process. 


A shared buffer (global) is defined which is accessible by producer and 
consumer. 


e Producer process produces items and appends these in buffer. 
e Consumer process consumes items from buffer. 
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The Readers-Writers Problem 

Suppose that a database is to be shared among several concurrent 

processes. Some of these processes may want only to read the databases 

(we can say these as readers). Whereas other processes may want to 

update (that is, to read and write) the database (we can say these 

processes as writers). Obviously, if two readers access the shared data 

simultaneously, no adverse effects will result. 

e However, if a write and some other process (either a reader or a writer) 
access the database simultaneously, chaos may ensure. 

e To ensure that these difficulties do not arise, we require that the writers 
have exclusive access to the shared database while writing to database. 


e This synchronization problem is referred to as the readers-writers 
problem. 


-Key POINTS sessies 
+ Both of the processes (i.e., producer and consumer) cannot access buffer 
simultaneously. 
+ Mutual exclusion should be here. 


+ Producer cannot append item in full buffer (finite buffer) and consumer can 
not take item from empty buffer. 


The Dining-Philosophers Problem 

Consider five philosophers who spend their lives thinking and eating. The 
philosophers share a circular table surrounded by 5 chairs, each belonging 
to one philosopher. In the center of the table is a bowl of rice and the table 
is laid with 5 chopsticks. When a philosopher thinks, he doesn’t interact 
with his colleagues. 


From time to time, a philosopher gets hungry and tries to pick up the 2 
chopsticks that are closest to him (the chopsticks that are between him 
and his left and right neighbours). A philosopher may pick up only one 
chopstick at a time. Obviously, he can’t pick up a chopstick that is already 
in the hand of a neighbour. When a hungry philosopher has both his 
Chopsticks at the same time. He eats without releasing his chopsticks. 
When he is finished, he puts down both of his chopsticks and starts 
thinking again. 

This problem is a simple representation of the need to allocate several 
resources among several processes in a deadlock free and starvation free 
manner. 


Deadlock 


In a multiprogramming environment, a situation when permanent blocking 
of a set of processes that either compete for system resources or 
communicate with each other happens, we can call this as deadlock 
situation. This deadlock problem involves conflicting needs for resources 
by two or more processes. 


Necessary Conditions for Deadlock 


A deadlock situation can arise, if the following four conditions hold 
simultaneously in a system. 


Mutual Exclusion 

Resources must be allocated to processes at any time in an exclusive 
manner and not on a shared basis for a deadlock to be possible. If another 
process requests that resource, the requesting process must be delayed 
until the resource has been released. 


Hold and Wait Condition 
Even if a process holds certain resources at any moment, it should be 
possible for it to request for new ones. It should not give up (release) the 
already held resources to be able to request for new ones. If it is not true, a 
deadlock can never take place. 


No Preemption Condition 
Resources can’t be preempted. A resource can be released only voluntarily 
by the process holding it, after that process has completed its task. 


Circular Wait Condition 

There must exist a set = {P), P;, Py, ..., P,} of waiting processes such that P) 
is waiting for a resource that is held by P,, R is waiting for a resource that is 
held by P),...,P,_4 is waiting for a resource that is held by P, and P, is 
waiting for a resource that is held by A). 


Resource 
Requests R, Held by 
Process Process 
Py P2 
Held by Requests 
2 


State diagram of circular wait condition 





200 Operating System 


Resource Allocation Graph 
The resource allocation graph consists of a set of vertices V and a set of 
edges E. 
Set of vertices V is partitioned into two types 
(i) P={P,, P... P}, the set consisting of all the processes in the 
system. 
(ii) R ={Ry, Ro,...,R,,}, the set consisting of all resource types in the 
system. 
e Directed Edge A > R; is known as request edge. 
e Directed Edge R; — A is known as assignment edge. 
Resource Instance 


= One instance of resource type R 4. = Two instances of resource type R3. 
= One instance of resource type R3. = Three instances of resource type R4. 


AA 


Ro Rg 
Example of resource allocation graph 


Process States 

e Process P, is holding an instance of resource type R, and is waiting for an 
instance of resource type R4. 

e Process P, is holding an instance of R4 and R, is waiting for an instance of 
resource type R}. 

e Process P} is holding an instance of R}. 

e Basic facts related to resource allocation graphs are given below 


Note If graph consists no cycle it means there is no deadlock in the system. 


If graph contains cycle 
(i) If only one instance per resource type, then deadlock. 


(ii) If several instances per resource type, then there may or may not be 
deadlock. 
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Deadlock Handling Strategies 
1. Deadlock prevention 2. Deadlock avoidance 
3. Deadlock detection 


Deadlock Prevention 
Deadlock prevention is a set of methods for ensuring that atleast one of the 
necessary conditions can’t hold. 


Deadlock Avoidance 

A deadlock avoidance algorithm dynamically examines the resource 
allocation state to ensure that a circular wait condition can never exist. The 
resource allocation state is defined by the number of available and 
allocated resources and the maximum demands of the processes. 


Safe State 


A state is safe, if the system can allocate Deadlock] Unsafe 
resources to each process and still avoid a 
deadlock. 


A system is in safe state, if there exists a safe 








sequence of all processes. A deadlock state is Safe 

an unsafe state. Not all unsafe states cause 

deadlocks. Safe, unsafe and deadlock 
Banker’s Algorithm 


Data structures for Banker's algorithm available. Vector of length m. If 
available [j ]= k, there are k instances of resource type R; available. 


e Max nx m matrix. If max[/, /]=k, then process P may request at most k 
instances of resource type Fj. 


e Allocation nx m matrix. If allocation [/, /]=k, then P is currently allocated 
k instances of R;. 
e Need nx m matrix. If need[/, /]=k, then A may need k more instances of 
R; to complete its task. 
Need [/, /]=max [/, /]-allocation[/, j ] 
e Safety Algorithm 
Step 1 Let work and finish be vectors of length m and n, respectively. 
Initialize 
Work = Available 
Finish [/]= False (fori =1,2,...,n) 
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Step 2 Find/ such that both 
(a) Finish [/]=False 
(b) Need < Work 
If no such / exists, go to step 4. 
Step 3 Work =Work + Allocation 
Finish [/] = Tree 
Go to step 2. 
Step 4 Finish [/]=True (for all i) 
Then, the system is in a safe system. 


Example Banker’s Algorithm 














Allocation Max Available 
A|BiC A|B iC A | B 
Pollo 7 513 313 
P 200 3ļl212 
P3 0/2 9 0/2 
a211 2/2]|2 
P 0 0/2 41313 


























Need = Max - Allocation 
Thus, we have 






























































Need 

P 7 4 3 ABC 

(7-0) | (5-1) | (3-0) P| 7 | 4] 3 
P, 1 2 2 

(3-2) | (2-0) | (2-0) P| 1) 242 
P, 6 0 0 = 

(9-3) | (0-0) | (2-2) P600 
P;| 0 1 1 

(2-2) | (2-1) | (2-1) P,} 0/14] 1 
P| 4 3 1 

(4-0) | (3-0) | (3-2) PAIS |a 


As available resources are (3 3 2). The process P, with need (1 2 2) can 
be executed. 
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Available resource = Available + Allocated resource of P, 
3.3 2 
2 0 0 
5 3. 2 


Now, available resources are (5 3 2). 
The next process that can be executed after assigning available 
resource is P;. Thus, P} will execute next. 
Now, available resource = Available resource + Resource of P} 
5 3 2 
211 
7 43 
Now, available resources are 7 4 3. Next process will be P}. 
ABC 
7 4 3 
0 0 2 
745 
Next process that will be executed is A. 
Available resource 


-Key Points =e 


+ The sequence <P, P;, P4, Py, Pà >ensures that the deadlock will never occur. 

+ If a safe sequence can be found that means the processes can run 
concurrently and will never cause a deadlock. 

+ First in first out scheduling based on queuing. 

+ Shortest seek time scheduling is designed for maximum through put in most 
scenarios 

+ RR scheduling involves extensive overhead, especially with a small time unit. 

+ Kernel is defined as a program running all times on the computer. It is the part 
of operating system that loads first. In simple word, kernel provides the 
communication between system software and hardware. 
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Deadlock Detection 
Allow system to enter deadlock state and then there is 
(i) Detection algorithm (ii) Recovery scheme 


Single instance of each resource type 
(i) Maintain wait for graph. 
(a) Nodes are processes. (b) A > F, if F is waiting for P}. 
(ii) Periodically invoke an algorithm that searches for a cycle in the 
graph. 
(iii) An algorithm to detect a cycle in a graph requires an order of n° 
operations, where n is the number of vertices in the graph. 
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Resource allocation graph Corresponding wait for graph 

















Recovery Scheme 
e Resource preemption 
e Process termination 
(i) Abort all deadlock processes 
(ii) Abort one process at a time until the deadlock cycle eliminated. 


Memory Management 


Memory management techniques allow several processes to share 
memory. When several processes are in memory they can share the CPU, 
thus increasing CPU utilisation. 


Overlays 

This techniques allow to keep in memory only those instructions and data, 
which are required at given time. 

The other instruction and data is loaded into the memory space occupied 
by the previous ones when they are needed. 


Swapping 
Consider an environment which supports multiprogramming using say 
Round Robin (RR) CPU scheduling algorithm. Then, when one process has 
finished executing for one time quantum, it is swapped out of memory to a 
backing store. 


Operating 
system 


C) Swap out 
Process 
Process 
User P2 
space 


Main memory Main backing store 














The memory manager then picks up another process from the 
backing store and loads it into the memory occupied by the previous 
process. Then, the scheduler picks up another process and allocates the 
CPU to it. 
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Logical versus Physical Address Space 


An address generated by the CPU is commonly referred to as a logical address, 
whereas an address seen by the memory unit is commonly referred to as physical 
address. 





Physical 
address 


Logical 
address 








Relocation Memory 
register 


14000 


Schematic view of logical versus physical address 


Memory Management Techniques 


The main memory must accommodate both the operating system and the 
various user processes. The parts of the main memory must be allocated in 
the most efficient way possible. 


There are two ways for memory allocation as given below 


Single Partition Allocation 


The memory is divided into two parts. One to be used by 1024 k 

OS and the other one is for user programs. The OS code | useR | 

and date is protected from being modified by user 0 

programs using a base register. Single partition 
allocation 


Multiple Partition Allocation 
The multiple partition allocation may be further classified as 


Fixed Partition Scheme 

Memory is divided into a number of fixed size partitions. Then, each 
partition holds one process. This scheme supports multiprogramming as a 
number of processes may how be brought into memory and the CPU can 
be switched from one process to another. 


When a process arrives for execution, it is put into the input queue of the 
smallest partition, which is large enough to hold it. 
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Variable Partition Scheme 

A block of available memory is designated as a hole at any time, a set of 
holes exists, which consists of holes of various sizes scattered throughout 
the memory. 


When a process arrives and needs memory, this set of holes is searched 
for a hole which is large enough to hold the process. If the hole is too large, 
it is split into two parts. The unused part is added to the set of holes. All 
holes which are adjacent to each other are merged. 


Hee ae See ee ae ae tee Ge eee tee eee ER eS ee ee See A ERA 


Searching for Hole in the Set 
There are several algorithm which are used to search for the hole in the set. 
= First fit This allocates the first hole in the set, which is big enough to hold the 
process. 
= Next fit This works like the first fit algorithm except that it keeps track of the 
position of the hole found in the set. The next time it is called it starts searching 
from that position. 


= Best fit This allocates the smallest hole, which is large enough to hold the process. 
= Worst fit This algorithm simply allocates the largest hole. 


Disadvantages of Memory Management Techniques 

The above schemes cause external and internal fragmentations of the 
memory as given below 

External Fragmentation When there is enough total memory in the system 
to satisfy the requirements of a process but the memory is not contiguous. 
Internal Fragmentation The memory wasted inside the allocated blocks of 
memory called internal fragmentation. 

e.g., consider a process requiring 150 k, thus if a hold of size 170 k is 
allocated to it the remaining 20 k is wasted. 


Compaction 


This is strategy to solve the problem of external fragmentation. All free 
memory is placed together by moving processes to new locations. 


Paging 

It is a memory management technique, which allows the memory to be 
allocated to the process wherever it is available. Physical memory is 
divided into fixed size blocks called frames. Logical memory is broken into 
blocks of same size called pages. The backing store is also divided into 
same size blocks. 
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When a process is to be executed its pages are loaded into available page 
frames. A frame is a collection of contiguous pages. Every logical address 
generated by the CPU is divided into two parts. The page number (P) and 
the page offset (d). The page number is used as an index into a page 
table. 


Logical address [© 
[CPU >[Pld fld 
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(Main memory) 











(Page table) 
A paging block diagram 


Each entry in the page table contains the base address of the page in 
physical memory (f). The base address of the Pth entery is then combined 
with the offset (d) to give the actual address in memory. 


-Key POINGS eee 


+ The size of a page is typically a power of 2. The selection of a power of 2 as a 
page size makes the translation of a logical address into a page number and 
page off set, particularly easy. 


+ Ifthe size of logical address space is 2” and a page size is 2” addressing units 
(bytes or words), then the high order (m-n) bits of a logical address designates 
the page number and the n low- order bits designates the page offset. 


Advantages of Paging 
The following are the advantages of paging 


It allows the memory of a process to be non-contiguous. This also solves 
the problem of fitting or storing memory blocks of varying sizes in 
secondary memory (such as backing store). 


Paging also allows sharing of common code. Two or more processes are 
allowed to execute the same code at the same time. 


Virtual Memory 


Separation of user logical memory from physical memory. It is a technique to 
run process size more than main memory. Virtual memory is a memory 
management scheme which allows the execution of a partially loaded process. 
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Advantages of Virtual Memory 

The advantages of virtual memory can be given as 

e Logical address space can therefore be much larger than physical 
address space. 

e Allows address spaces to be shared by several processes. 


e Less |/O is required to load or swap a process in memory, so each user 
can run faster. 


Segmentation 


e Logical address is divided into blocks called segmenti.e., logical address 
space is acollection of segments. Each segment has aname and length. 


e Logical address consists of two things 
< segment_number, offset> 


e Segmentation is a memory-management scheme that supports the 
following user view of memory. All the location within a segment are 
placed in contiguous location in primary storage. 


Demand Paging 
Virtual memory can be implemented by using either of the below. 

(i) Demand paging (ii) Demand segmentation 
Demand paging is combination of swapping and paging. With demand 
paged virtual memory, pages are only loaded when they are demanded 
during program execution. As long as we have no page faults, the effective 
access time is equal to the memory access time. If however, a page fault 
occurs, we must first read the relevant page from disk and then access the 
desired word. 


Effective Access Time 
Effective access time = [((1 — p) x ma) + (p x page fault time))] 
Here, ma = Memory access time 
p = The probability of page fault (0 < p <1) 
If p = 0, it means no page faults. 
If p = 1, every reference is a fault. 
We expect p to be close to zero that is there will be only few page faults. 


210 Operating System 


Page Replacement 


In a multiprogramming environment, the following scenario often results. 
While execution of a process, page fault occurs and there are no free 
frames on the free frame list. This is called over allocation of memory and 
results due to increase in the degree of multiprogramming. Page 
replacement is a technique to solve this problem. 


-Key Points e e 


+ Paging and segmentation implement non-contiguous allocation of memory. 
+ Logical memory is divided into blocks called pages physical memory is 
divided into fixed sized blocks known as frames. 


Concept 


If no free frame is available, a frame is found which is not in use. The 
contents of the frame are written onto a backing store and the page tables 
are modified to indicate that the page is no longer in memory. Thus, the 
frame can be used to hold the page for which the page fault was 
generated. 


One bit is associated with each frame. If the bit is 1, this implies that the 
contents of the page was modified this is called the dirty bit. If the dirty bit is 
0, the content of the frame need not be written to the backing store. 


Page Replacement Algorithms 


In a computer operating system that uses paging for virtual memory 
management, page replacement algorithms decide which memory pages 
to page out when a page of memory needs to be allocated. 


First In First Out (FIFO) 

A FIFO replacement algorithm associates with each page, the time when 

that page was brought into memory. When a page must be replaced, the 

oldest page is chosen. 

e Itis not strictly necessary to record the time, when a page is brought in. 
We can create a FIFO queue to hold all pages in memory. We replace the 
page at the head of the queue. When a page is brought into memory we 
insert it at the tail of the queue. 

e.g., Consider the reference string 


7,0, 1, 2,0, 3,04, 2,3,03,2,1,2,0,1, 7,0, 1 


Handbook Computer Science & IT 211 


Initially our 3 frames (frame size =3) are empty. The first 3 references 
(7, 0, 1) cause page faults and are brought into these empty frames. The 
next reference (2) replaces page 7, because page 7 was brought in first. 
Since, 0 is the next reference and 0 is already in memory, we have no fault 
for this reference. The first reference to 3 results in replacement of page 0, 
since it is now first in line. Because of this replacement, the next reference 



















































































to 0, will fault. Page 1 is then replaced by page 0. This process continues 
as shown in below figure. 

Page faults occurs |v |V¥ V| V| v| V| v| v| viv] Viv iv iv Viv ivi 
Reference string 7/0/1|/2|0/3|/0]|4|2/3|0|/3|/2|1|/2ļ/0|1|7|0)1 
7|7|7|2 2|2|4]4|4|0 0|0 7|7 17 
Page frames 0|0/0 3/3|3|2|2]2 1/1 1/00 
1/1 1/0}0|0) 3/3 3|2 2/2/1 

Total number of page faults in the above example = 15 





Optimal Page Replacement 

It has the lowest page fault rate of all algorithms and will never suffer from 
Belady’s anomaly. This algorithm replaces the page that will not be used 
for the longest period of time. This algorithm gives a lowest page fault rate. 
It is very difficult to implement because, if requires future knowledge about 
the usage of the page. 


e.g., Using the previous reference string, the optimal page replacement 
algorithm would field 9 page faults. The first 3 references cause faults that 
fill the 3 empty frames. The reference to page 2 replaces page 7, because 
page 7 will not be used until reference 18, whereas page 0 will be used at 5 
and page 1 at 14. The reference page 3 replaces page 1, as page 1 will be 


£ 

















the last of the 3 pages in memory to be referenced again. 
Page faults occurs |W |W Jy] | W Jv Jv Jv 
Reference string 7/0/1|/2/0/3|/0|4|2/3|0|/3|/2|1|/2ļ/0|1|/7|0j1 
717 Z2 2 2 2 2 7 
Page frames 0/00 0 4 0 0 0 
1/1 3 3 3 1 1 







































































-Key Points 


+ Beladys’ Anomaly For some page replacement algorithms, the page fault rate 
may increase as the number of allocated frames increases. 
+ This phenomenon known as Belady’s anomaly. 
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Least Recently Used (LRU) Page Replacement 

In this, we use the recent past as an approximation of the near future, then 
we can replace the page that has not been user for the longest period of 
time. 


e.g., We are considering the previous reference string (used in optimal 
page replacement algorithm). In LRU algorithm, the first 5 faults are the 
same as those for optimal replacement. When the reference to page 4 
occurs, however LRU replacement sees that of the 3 frames in memory, 
page 2 was used least recently, Thus, the LRU algorithm replaces page 2, 
not knowing that page 2 is about to be used. When it then faults for page 2, 
the LRU algorithm replaces page 3, since it is how the least recently used 

















of the three pages in memory. Consider frame size = 3 for example given 
below. 
Page faults occurs |W |W |V| 4| |s I s| |W) |v 
Reference string 7{0/1}2{0/3|0/4/2|3/0/3|2]1/2/0/1/7|0/1 
7|7\7)2 2 4/4 4/0 1 1 1 
Page frames 010/0 0 0/0/3113 3 0 0 
1/1 3 3|2/2]|2 2 2 7 







































































Frame Allocation 


The maximum number of frames that can be allocated to a process 
depend on the size of the physical memory (i.e, total number of frames 
available). The minimum number of frames which can be allocated to a 
process depend on the instruction set architecture. 


Thrashing 


e A process is said to be thrashing when it is spending more time in paging 
(i.e. , itis generating a lot of page faults) than executing. 

e Thrashing causes low CPU utilisation. The 
processes which are thrashing queue up for 
the paging device. 

e The CPU schedule sees the empty ready 
queue and tries to increase the degree of 
multiprogramming by introducing more 
processes in the system. These new 
processes cause more page faults and 
increase the length of the queue for the paging device. 


Thrashing 
<—_—__> 


CPU utilisation 


Degree of multiprogramming 


File Management 


Logically related data items on the secondary storage are usually 
organised into named collections called files. A file may contain a report, 
an executable program as a set of commands to the operating system. A 
file often appears to the users as a linear array of characters or record 
structures. 

The file system consists of two parts 


(i) A collection of files (ii) A directory structure 
The file management system can be implemented as one or more layers of 
the operating system. 
The common responsibilities of the file management system includes the 
following 
e Mapping of access requests from logical to physical file address space. 
e Transmission of file elements between main and secondary storage. 
e Management of the secondary storage such as keeping track of the 
status, allocation and deallocation of space. 
e Support for protection and sharing of files and the recovery and possible 
restoration of the files after system crashes. 


File Attributes 


Each file is referred to by its name. The file is named for the convenience of 
the users and when a file is named, it becomes independent of the user 
and the process. Below are file attributes 


e Name e Type 

e Location e Size 

e Protection e Time and date 
Disk Scheduling 


One of the responsibilities of the OS is to use the hardware efficiently. For 

the disk drives, meeting this responsibility entails having fast access time 

and large disk bandwidth. 

Access time has two major components 

e Seek time is the time for the disk arm to move the heads to the cylinder 
containing the desired sector. 
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e The rotational latency is the additional time for the disk to rotate the 
desired sector to the disk head. It is not fixed, so we can take average 


value. 


One complete revolution time 


Rotational latency = 





Disk bandwidth is the total number of bytes transferred, divided by the 


total time between the first for service and the completion of last transfer. 


The OS provides system call to create, read , write, delete and truncate files. 
The common file operations are as follows 
= File Creation For a file to be created firstly, a space must be found for a file, in the 


File Operations 


file system and secondarily a new entry must be made in the directory for the new 
file. 

Writing a File In order to write a file, a system call is made, which specify the file 
name and the information to be written in file. 

Reading a File To read from a file, we use a system call that specifies the name of 
the file and where in the memory. The next block of the file should be put and 
again for the associated directory entry. 

Repositioning within a File Repositioning within a file desn’t need to invoke any 
actual I/O. This is also known as file seek. 

File Deletion The directory for the named file is searched in order to delete a file 
and when the associated directory entry is found, all the file space is released and 
the directory entry is removed. 


Truncating a File Truncating a file is used when the user wants the attributes of the 
file to remain the same but wants to erase the contents of the file. There is no use of 
deleting a file and recreating it rather this function allow all attributes to remain 
unchanged but for the file to reset the length zero. 


FCFS Scheduling 


This is also known as First In First Out (FIFO) simply queues processes in 
the order that they arrive in the ready queue. 


The following features which FIFO scheduling have. 

e First come first served scheduling. 

e Processes request sequentially. 

e Fair to all processes, but it generally does not provide the fastest service. 
e Consider a disk queue with requests for I/O to blocks on cylinder i.e. 


Queue = 98, 183, 37, 122, 14, 124, 65, 67 
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Head starts at 53 
0 14 37 53 65 67 98 122 124 183 199 











FCFS disk scheduling 
Total head movement 
=[(98 ~ 53) + (183 ~ 98) + (183 ~ 37) + (122 ~ 37) + 

(14 ~ 122) + (65 ~ 124) + (67 ~65)] 

=604 
The wild swing from 122 to 14 and then back to 124 illustrates the problem 
with this schedule. If the requests for cylinder 37 and 14 could be serviced 
together, before or after the requests for 122 and 124, the total head 


movement could be decreased substantially and performance could be 
thereby improved. 


Shortest Seek Time First (SSTF) Scheduling 
It selects the request with the minimum seek time from the current head 
position. SSTF scheduling is a form of SJF scheduling may cause 
starvation of some requests. It is not an optimal algorithm but its 
improvement over FCFS. 

Queue = 98, 183, 37, 122, 14, 124, 65, 67 


Head starts at 53. 
0 14 37 53 65 67 98 122 124 183 199 











SSTF disk scheduling 
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e This scheduling method results in a total head movement of only 
236 cyclinders little more than one-third of the distance needed for FCFS 
scheduling of this request time. Clearly this algorithm gives a substancial 
improvement in performance. 


SCAN Scheduling 


In the SCAN algorithm, the disk arm starts at one end of the disk and 
moves toward the other end, servicing requests as it reaches each cylinder 
until it gets to the other end of the disk. At the other end, the direction of 
head movement is reversed and servicing continues. The head 
continuously scans back and forth across the disk. The SCAN algorithm is 
sometimes called the elevator algorithm, since the disk arm behaves just 
like an elevator in a building, first servicing all the request going up and 
then reversing to service requests the other way. 


Queue = 98, 183, 37, 122, 14, 124, 65, 67 
Head starts at 53. 
0 14 37 53 65 67 98 122 124 183 199 








SCAN disk scheduling 


C-SCAN Scheduling 


Circular SCAN is a variant of SCAN, which is designed to provide a more 
uniform wait time. Like SCAN, C-SCAN moves the head from one end of 
the disk to the other, servicing requests along the way. When the head 
reaches the other end, however it immediately returns to the beginning of 
the disk without servicing any requests on the return trip. The C-SCAN 
scheduling algorithm essentially treats the cylinders as a circular list that 
wraps around from the final cylinder to the first one. 


Queue = 98, 183, 37, 122, 14, 124, 65, 67 
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Head starts at 53. 


0 14 37 53 65 67 98 122 124 183 199 














-Key Points e 


+ 


Bootstrap For a computer to start running for instances when it is powered up 
or rebooted, it needs to have an initial program to run. This initial program is 
known as bootstrap program. 

Bootstrap program is stored in ROM or EEPROM which is known as a 
firmware. 

System Calls System calls provide an interface to the service made available 
by operating system. 

Single Processor System In a single processor system, there is one main CPU 
capable of executing a general purpose instruction set, including instructions 
from user processes. 

Multi Processor System It is alos known as a parallel system or tightly 
coupled system. 

Cluster System In Cluster system, computers share common storage and are 
closely linked via a LAN. 


Different Types of Scheduling Algorithm 








Scheduling Algorithm | CPU Overhead | Throughput al o Response 
ime Time 

First In First Out Low Low High Low 

Shortest job first Medium High Medium Medium 

Priority based Medium Low High High 

Scheduling 

Priority Based High Medium Medium High 

Scheduling 

Multilevel Queue High High Medium Medium 

Scheduling 














Computer Network 


Computer Network 


A collection of computers (or computer like devices) that are able to 
communicate with each other through some medium, using hardware and 
software. Two computers (or computer like devices) are said to be 
connected, if they are able to exchange information or able to 
communicate. 

Every network includes elements to enable data transfer or sharing are given 
below. 

e Atleast two computers (or computer like devices) 

e Network interfaces 

e A connection medium 

e Operating system, strategies, algorithms and protocols 





Data Transfer Modes 
There are mainly three modes of data transfer 
Simplex Data transfer only in one direction e.g., radio broadcasting. 


Half Duplex Data transfer in both direction, but not simultaneously i.e., in 
one direction at a time e.g., talk back radio, CB radio (citizen band). 


Full Duplex or Duplex Data transfer in both directions, simultaneously 
e.g., telephone. 
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Two way but not at Both way at 
o the same time the same time 
pave Sender Receiver 
æ= 
Sender Receiver Receiver Sender Sender/ Sender/ 
Receiver Receiver 
Simplex Half duplex Duplex 


Different modes of data transfer 


Elements used in Computer Networks for 
Communication 


Some basic elements which are using in communication systems of a 
computer network are given below 


e Data Source Provides the data to transmit. 

e Sender (Transmitter) Converts data to signals for transmission. 

e Data Transmission System Transmits the data/.e., converted in signals. 
e Receiver Converts received signals to data. 

e Destination Receives and uses incoming data. 


e Node A device with independent communication ability and unique 
network address. 


e Protocol A formal description, comprising rules and conventions defines 
the method of communication between networking devices. 


Methods of Message Delivery (i.e., Casting) 

A message can be delivered in the following ways 

Unicast One device sends message to the other to its address. 

Broadcast One device sends message to all other devices on the network. 
The message is sent to an address reserved for this goal. 


Multicast One device sends message to a certain group of devices on the 
network. 
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Types of Networks 


The types of network based on their coverage areas are as given below 


LAN (Local Area Network ) 

LAN is privately owned network within a single building or campus. LANs 
can be small, linking as few as three computers, but often link hundreds of 
computers used by thousands of people (like in some IT office, etc.) 

An arbitration mechanism is installed in LAN to decide, which machine will 
use the access, when more than one machines are requesting for 
communication. LAN are used for connecting personal computers, 
workstations, routers and other devices. 

Their main characteristics are given below 

e Topology The geometrical arrangement of the computers or nodes. 

e Protocols How they communicate. 

e Medium Through which medium. 


MAN (Metropolitan Area Network) 
A MAN covers a city. An example of MAN is cable television network in city. 


WAN (Wide Area Network) 

A wide area network or WAN spans a large geographical area often a country. 
Internet It is also known as network of networks. The Internet is a system 
of linked networks that are world wide in scope and facilitate data 
communication services such as remote login, file transfer, electronic mail, 
world wide web and newsgroups etc. 
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Network Topology 


Network topology is the arrangement of the various elements of a 
computer or biological network. Essentially it is the topological structure of 
a network, and may be depicted physically or logically. Physical topology 
refers to the placement of the network's various components, inducing 
device location and cable installation, while logical topology shows how 
data flows within a network, regardless of its physical design. 


The common network topologies include the following sections 


Bus Topology Common bus 


or 
In bus topology, each node (computer server, Shared bus 
other computer like devices) is directly connected = 


to a common cable. h d 


Bus topology 

















Note The drawback of this topology is that if the network cable breaks, the 
entire network will be down. 


Star Topology 
,; ; Hub 
In this topology, each node has a dedicated set of wires 7 
connecting it to a central network hub. Since, all traffic 
passes through the hub, it becomes a central point for 
isolating network problems and gathering network statistics. 
Rin g Topolo gy Star topology 
, C ti 
A ring topology features a logically closed loop. Pona 
Data packets travel in a single direction around the device to 
ring from one network device to the next. Each other 


network device acts as a repeater to keep the 
signal strong enough as it travels. 


Ring topology 
Mesh Topology 
| h | h Number of connection = n (n — 1)/2 
n mee} topology, eac in mesh topology, where 
system is connected to all n = number of nodes 


other systems in the network. 





Mesh topology 
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-Key POINTS senmisin 


+ In bus topology at the first, the message will go through the bus then one user 
can communicate with other. 

In star topology, first the message will go to the hub then that message will go 
to other user. 

+ In ring topology, user can communicate as randomly. 

+ In mesh topology, any user can directly communicate with other users. 


+ 


Tree Topology 
In this type of network topology, in which a central root is 
connected to two or more nodes that are one level lower in 
hierarchy. 
Tree topology 


Hardware/Networking Devices 

Networking hardware may also be known as network equipment. computer 
networking devices. 

Some important networking devices used in the medium of communication 
are given below 


Network Interface Card (NIC) 

NIC provides a physical connection between the networking cable and the 
computer's internal bus. 

NICs comes in three basic varieties 8 bit, 16 bit and 32 bit. The larger 
number of bits that can be transferred to NIC, the faster the NIC can 
transfer data to network cable. 


Repeater 

Repeaters are used to connect together two Ethernet segments of any 
media type. In larger designs, signal quality begins to deteriorate as 
segments exceed their maximum length. We also know that signal 
transmission is always attached with energy loss. So, a periodic refreshing 
of the signals is required. 


Hubs 


Hubs are actually a multiport repeaters. A hub takes any incoming signal 
and repeats it out all ports. 
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-Key Points 2 


+ Repeaters provide the signal amplification required to allow a segment (cable) 
to be extended a greater distance. 


+ A typical repeater has just 2 ports, a hub generally has from 4 to 24 ports. 


Different Types of Ethernet Network Table 











Ethernet Network Type | Maximum Nodes/Segment Bene eni (Cable) 
10 BASE-T 100 m 
10 BASE 2 185 m 
10 BASE 5 500 m 






10 BASE-FL 2000 m 
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(can have upto I I 


30 hosts in case 
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Before repeater Digital signal after repeater 
Use of repeaters between segments 




































































Bridges 


When the size of the LAN is difficult to manage, it is necessary to breakup 
the network. The function of the bridge is to connect separate networks 
together. Bridges do not forward bad or misaligned packets. 
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Table of Table of Table of | Table of 
MAC MAC MAC MAC 
addresses | addresses addresses | addresses 
Host 3 for for Host 6 for for Host 9 Host 9 

















Segment | Segment Segment | Segment Transmits 
#1 #2 #2 #3 









































































































































Host 1 Host 2 Host 4 Host 5 Host 7 Host 8 
Segment #1 Segment #2 Segment #3 
F d Destination host 
~——$—__ l OWA aga Noton 
frames 


Segment #3 
Use of bridges between different segments 


-Key BORE S E E E ane 
+ Bridges map the addresses of the nodes residing on each network segment 
and allow only necessary traffic to pass through the bridge. 
+ When a packet is received by the bridge, the bridge determines the 
destination and source segments. 
+ If the segments are the same, the packet is filtered, if the segments are 
different, then the packet is forwarded to the correct segments. 


Switch 


Switches are an expansion of the concept of bridging. LAN switches can 
link 4,6,10 or more networks together. 

Cut through switches examine the packet destination address, only before 
forwarding it onto its destination segment, while a store and-forward switch 
accepts and analyzes the entire packet before forwarding it to its 
destination. It takes more time to examine the entire packet, but it allows to 
catch certain packet errors and keep them from propagating through the 
network. 


Routers 


Router forwards packets from one LAN (or WAN) network to another. It is 
also used at the edges of the networks to connect to the Internet. 
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Gateway 


Gateway acts like an entrance between two different networks. Gateway in 
organisations, is the computer that routes the traffic from a work station to 
the outside network that is serving web pages. 

ISP (Internet Service Provider) is the gateway for Internet service at homes. 


Some IEEE Standards for Networking 





Standard Related to/Used for 
IEFE 802.2 Logical Link Control 
IEEE 802.3 Ethernet 
IEEE 802.5 Token Ring 
IEEE 802.11 Wireless LAN 
IEEE 802.15 Bluetooth 
IEEE 802.16 Wireless MAN 





-Key PMN Se 


+ Switches have 2 basic architectures: cut through and store-and-forward. 

+ Switching is a technology that alleviates congestion in Ethernet LANs by 
reducing traffic and increasing bandwidth. 

* It operates in the third layer and forwards packets based on network 
addresses using routing tables and protocols. 

+ Gateway’s main concern is routing traffic from a work station to the outside 
network. 


Ethernet 


It is basically a LAN technology which strikes a good balance between 
speed, cost and easy of installation. The Institute for Electrical and 
Electronics Engineering (IEEE) defines the Ethernet Standard as IEEE 
Standard 802.3. This standard defines rule for configuring an Ethernet 
network as well as specifying how elements in an Ethernet network interact 
with each other. 


Ethernet uses Carrier Sense Multiple Access/Collision Detect (CSMA/CD) 
technology, broadcasting each from onto the physical medium (wire fibre 
and so on). All stations attached to the Ethernet listen to the line for traffic 
and the station with the matching destination MAC address accepts the 
frame. 
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221-22:51:01 221-22-51:04 221-22-51:07 
221-22-51:02 221:22-51:05 221-22:51:08 
221-22-51:03 221-22-52:06 221-22-52:09 
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server 


Ethernet 221.22.0.3 diagram 


Token Ring 


= Itis another form of network configuration, which differs 
from Ethernet in that all messages are transferred in a 
unidirectional manner along the ring at all times. 





i 

l 

Ji 

i 

l 

| = Medium Access Control (MAC) is provided by a small [A] 
| frame, the token that circulates around the ring when all 

ı stations are idle. Only the station (node) possessing the 
ı token is allowed to transmit at any given time. Sender 
! looks for free token and changes free token to busy token 
' and appends data. 

l 


= This technology can connect upto 255 nodes in a physical star or ring connection 


that can sustain 4 or 16 Mbps. 


FDDI (Fibre Distributed Data Interface) 
This is a form of network configuration, 
which uses a ring topology of 

multimedia or single mode optical fibre i“ 
transmission links operating at 100 

Mbps to span upto 200 km and permits 
upto 500 stations. It employs dual 
counter rotating rings. Here, 16 and 48 NS 


bit addresses are allowed. In FDDI, 
token is absorbed by station and 














released as soon as it completes the FDDI between ae A,B,C and D 


frame transmission. 
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OSI Model 


The Open System Interconnection (OSI) model is a reference tool for 
understanding data communication between any two networked system. It 
divides the communication processes into 7 layers. Each layer performs 
specific functions to support the layers above it and uses services of the 
layers below it. Each layer represents a different level of abstraction and 
layers boundaries are well defined. 


Benefits of OSI Model 


It helps users understand the big picture of networking. It helps users 
understand how hardware and software elements function together. OSI 
model makes troubleshooting easier by separating networks into 
manageable pieces. The OSI model provides a common language to 
explain components and their functionality. 





Chaos Networking 
Checking for networking OSI seven layer model 
errors ~ zg Application Application Ao 
- Application 
Sending Presentation layers 
massages ———> Session 
Address ewi Transport 
of server — ~~ ne wire Network Data flow 
Data Link layers 
Encryption —L “~ 1s and 0s Physical 














OSI model architecture 


Note Without the OSI model, networks would be very difficult to understand 
and implement. 


Physical Layer 

The physical layer coordinates the functions required to transmit a bit 
stream over a physical medium. It deals with the mechanical and electrical 
specifications of interface and transmission medium. It also defines the 
procedures and functions that physical devices and interfaces have to 
perform for transmission to occur. 
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Functions of Physical Layer 
There are several functions physical layer 
e Physical layer defines characteristics of the interface between the 
devices and the transmission medium. 
e |t defines the type of transmission medium. 
e |tdefines the transmission rate /.e., the number of bits sent each second. 
e |t performs synchronisation of sender and receiver clocks. 
e tis concerned with the connection of devices to the medium. 
(i) Point-to-point configuration Two devices are connected together 
through dedicated link. 
(i) Multipoint configuration A link is shared between several devices 
e Itis concerned with the physical topology. 
e It defines the direction of transmission i.e., transmission mode 
(simplex, half duplex or duplex). 
e \|t transmits bit stream over the communication channel. 
Hardware Used Repeater and Hub. 
Data Unit Bit stream 


Data Link Layer 


The data link layer transforms the physical layer, a raw transmission facility, 
to a reliable link and is responsible for Node-to-Node delivery. It makes the 
physical layer appear error free to the upper layer (/.e, network layer). 


Functions of Data Link Layer 

Data link layer is responsible for 

Framing (/.e., division of stream of bits received from network layer into 
manageable data units called frames/or segmentation of upper layer 
datagrams (also called packets) into frames). 

Flow Control (/.e., to manage communication between a high speed 
transmitter with the low speed receiver). 

Error Control (/.e., adding mechanism to detect and retransmit damaged 
or lost frames and to prevent duplication of frames. To achieve error 
control, a trailer is added at the end of a frame). 

Access Control (/.e., to determine which device has control over the link 
at any given time, if two or more devices are connected to the same link. 
Physical Addressing (i.e., adding a header to the frame to define the 
physical address of the sender (source address) and/or receiver 
(destination address) of the frame.) 





Handbook Computer Science & IT 229 


Hardware Used Bridges and switches. 
Data Unit Frames 


Protocol Used Simplex protocol, stop and wait protocol, sliding window, 
HDLC (High Level Data Link Control), SDLC, NDP, ISDN, ARP, PSL, OSPF, 
NDP. 


Network Layer 


Network layer is responsible for source to destination delivery of a packet 
possibly across multiple networks (links). If the two systems are connected 
to the same link, there is usually no need for a network layer. However, if 
the two systems are attached to different networks (links) with connecting 
devices between networks, there is often a need of the network layer to 
accomplish source to destination delivery. 


Functions of the Network Layer 

Network layer responsibilities include 

e Logical Addressing The physical addressing implemented by the data 
link layer handles the addressing problem locally (/.e., if the devices are in 
the same network). If packet passes the network boundary, we need another 
addressing system to distinguish the source and destination systems. 


e Routing Independent networks or links are connected together with the 
help of routers or gateways. Routers route the packets to their final 
destination. Network layer is responsible for providing routing mechanism. 

e Hardware Used Routers 

e Data Units Packets 


e Protocols Used IP (Internet, Protocol), NAT (Network Address Translation), 
ARP (Address Resolution Protocol), ICMP (Internet control Message 
Protocol), BGP(Border Gateway Protocol), RARP (Reverse Address 
Resolution Protocol), DHCP (Dynamic Host Configuration Protocol), 
BOOTP, OSPF. 


Transport Layer 


The transport layer is responsible for source to destination (end-to-end) 
delivery of the entire message. Network layer does not recognise any 
relationship between the packets delivered. Network layer treats each 
packet independently, as though each packet belonged to a separate 
message, whether or not it does. The transport layer ensures that the whole 
message arrives intact and in order. 
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-Key POINGS ee 


+ The network layer adds a header to the packet coming from the upper layer 
that among other things, includes the logical addresses of the sender and 
receiver. 


+ The network layer gets each packet to the correct computer, the transport 
layer gets the entire message to the correct process on that computer. 


+ These sequence numbers are used to reassemble the message correctly and to 
identify and replace packets that were lost in transmission. 


Functions of Transport Layer 

Responsibilities of transport layer includes 

e Service Point Addressing The transport layer header must include a 
type of address called service point address (or part address). 

e Segmentation and Reassembly A message is divided into 
transmittable segments, each segment containing a Sequence number. 

e Flow Control Flow control at this layer is performed end to end rather 

than across a single link. 

e Error Control This layer performs an end to end error control by ensuring 

that the entire message at the receiving transport layer without error 

damage, loss or duplication). Error correction is usually achieved 

through retransmission. 

e Connection Control Transport layer can deliver the segments using 

either connection oriented or connectionless approach. 

Hardware Used Transport Gateway 

Data Unit Segments 

Protocol Used TCP (Transmission Control Protocol) for connection oriented 

approach and UDP (User Datagram Protocol) for connectionless approach. 


— 





Session Layer 


The session layer is the network dialog controller. It establishes, maintains 
and synchronises the interaction between communicating systems. It also 
plays important role in keeping applications data separate. 


Functions of Session Layer 

Specific responsibilities of the session layer include the following 

Dialog Control Session layer allows the communication between two 
processes to take place either in half duplex or full duplex. It allows 
applications functioning on devices to establish, manage and terminate a 
dialog through a network. 
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Synchronization The session layer allows a process to add check points 
(synchronization points) into a stream of data. For example, if a system is 
sending a file of 2000 pages, we can insert check points after every 
100 pages to ensure that each 100 page unit is received and 
acknowledged independently. In this case, if a crash happens during the 
transmission of page 523, retransmission begins at page 501, pages 1 to 
500 need not to be retransmitted. 


IRP AE IR EEE NAN OPEN E N IE ESE E ESET E LEE en AS IAE BE A A E EI IAEI E E I, 


i Unit Used in Session Layer l 
1 
' = Data Unit Data 
| = Protocol Used ADSP, ASP, ISO-SP, L2TP, F2F, PAP, PPTP, RPC, SMPP, SDP, ZIP, ! 
l 1 
L 


Presentation Layer 


This layer is responsible for how an application formats data to be sent out 
onto the network. This layer basically allows an application to read 
(or understand) the message. 


Functions of Presentation Layer 

Specific responsibilities of this layer include the following 

Translation Different systems use different encoding system, so the 
presentation layer provides interoperability between these different 
encoding methods. This layer at the sender end changes the information 
from sender dependent format into a common format. The presentation 
layer at receiver end changes the common format into its receiver 
dependent format. 

Encryption and Decryption This layer provides encryption and decryption 
mechanism to assure privacy to carry sensitive information. Encryption 
means sender transforms the original information to another form and at 
the receiver end, decryption mechanism reverses the new form of data into 
its original form. 

Compression This layer uses compression mechanism to reduce the 
number of bits to be transmitted. Data compression becomes important in 
the transmission of multimedia such as text, audio and video. 


1 
l . 

l Units used in Presentation Layer i 
| = Data Unit Data i 
l 

| = Protocol Used AFP, ASCII, EBCDIC, ICA, LPP, NCP, NDR, XDR, X.25 PAP. i 
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Application Layer 


This layer enables the user, whether human or software, to access the 
network. It provides user interfaces and support for services such as 
electronic mail, remote file access and transfer shared database 
management and other types of distributed information services. 


Functions of Application Layer 

Specific services provided by the application layer include the following 

e Network Virtual Terminal A network virtual terminal is a software version 
of a physical terminal and allows a user to logon to a remote host. To do 
so, the application creates a software emulation of a terminal at the 
remote host. 

e File Transfer, Access and Management This application allows a user 
to access files, retrieve files, manage files or control files in a remote 
computer. 

e Mail Services Electronic messaging (/.e., e-mail storage and forwarding) 
is provided by this application. 

e Directory Services This application provides distributed database sources 
and access for global information about various objects and services. 


r------------------------------------------------------------a~ 


i Units used in Application Layer i 
= Hardware Used Application Gateway 
| a Protocol Used HTTP, SMTP, POP3, FTP, Telnet etc. H 
ı = Data Unit Data 
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TCP/IP Protocol Suite 


The TCP/IP protocol suite used in the Internet, was developed prior to the 
OSI model. Therefore , the layers in the TCP/IP protocol suite do not exactly 
match those in the OSI model. The original TCP/IP protocol suite was 
defined as having four layer. 


OSI Model TCP/IP 
layers architecture 
layers 
Application : 
layer Application P 


=053 -0 4 
VAsn 
AZO 


vazo 





Presentation layer 
layer 
Session 
layer 
Host-to-host 
en transport TCP | | UDP | 
ayer layers 
Network Internet 
layers ARP IP IGMP | ICMP 
Data link Meee 
layer networ Token | |Frame 
Ethernet ring relay ATM 


Network 
A general TCP/IP 





























Physical interface 
layer layer 











As we can see in the above diagram TCP/IP (Transmission Control Protoco 
/Internetworking Protocol) model contains four layers. The first three layers 
of TCP/IP model (Network Interface layer, Internet layer and Transport 
layer) provide physical standards, network interface, Internetworking and 
transport functions that corresponds to first four layers of the OSI model. 


The three top most layer in the OSI model, however are represented in TCP 
/IP by a single layer called the Application layer. 

Transport layer is designed to allow peer entities on the source and 
destination, host to carry on a conversation. 

Internet layer permits the host to inject packets into any network and let 
them travel independently to the destination. 

The data link layer is the networking scope of the local network 
connection to which host is attached. 
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Error Control (Detection and Correction) 
Many factors including line noise can alter or wipe out one or more bits of a 


given data unit. 
Types of Errors 


Single bit Burst 
(only one bit in the (2 or more bit in the 
data unit has changed) data unit has changed) 


Errors classification 
Length of burst 
















































































Sent error ial Soaks bits) 
o[ 0] 1}o ofo] 1] 1] TnOOROOOROOOREOE 
Receivedy: Sent Pa Bits corrupted by 
Taeng burst error 
1 changed to 0 
olilojolil ililia 1] 10] 1] 1J0]o 
Received 


Error correction 


-Key POINGS ee 


+ Reliable systems must have mechanism for detecting and correcting such 
errors. 

* Error detection and correction are implemented either at the data link layer or 
the transport layer of the OSI model. 


Error Detection 
Error detection uses the concept of redundancy, which means adding extra 


bits for detecting errors at the destination. 
Detection Methods 





Y y y 
VRC LRC CRC Checksum 
Benson (Longitudinal (Cyclic 
Check) y Redundancy Redundancy 
Check) Check) 
Receiver Sender 


Data 
10101011 


Checking Generating 
function function 





Redundancy check 








101010111 | 
Data and redundancy check 


How error detection works with redundancy concept 
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Note Checking function performs the action that the received bit stream 
passes the checking criteria, the data portion of the data unit is accepted 
else rejected. 


Vertical Redundancy Check (VRC) 
In this technique, a redundant bit, called parity bit, is appended to every 
data unit, so that the total number of 1’s in the unit (including the parity bit) 


becomes even. If number of 1’s are already even in data, then parity bit will 
be 0. 


Sender 


1100001] Data (number of 1's in 
data = 3 i.e.,odd) 





Checking 
function : 
Even-parity 
1100001 generator 
Is total number 
ot Teeven Total number of f 
u 
1's=4 VRC 


(inducing data and parity bit) 
Detection of error using vertical redundancy check method 


Some systems may use odd parity checking, where the number of 1's 
should be odd. The principle is the same, the calculation is different. 


Longitudinal Redundancy Check (LRC) 


In this technique, a block of bits is divided into rows and a redundant row of 
bits is added to the whole block. 


Original data at sender end 














11100111 11011101 00111001 10101001 








Total number of 
1's =4, 
so the parity 
bit added here 











LRC ——- 10101010] is O 
To the 11100111 11011101 00111001 10101001 
checking 
function at 


receiver end 
Original data + LRC 
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Note The first parity bit in the 5th row is calculated based on all first bits. The 
second parity bit is calculated based on all second bits and so on. 


Checksum 
There are two algorithms involved in this process, checksum generator at 
sender end and checksum checker at receiver end. 
The sender follows these steps 
e The data unit is divided into k sections each of n bits. 
e All sections are added together using 1's complement to get the sum. 
e The sum is complemented and becomes the checksum. 
e The checksum is sent with the data. 
The receiver follows these steps 
e The received unit is divided into k sections each of n bits. 
e All sections are added together using 1’s complement to get the sum. 
e The sum is complemented. 
e lf the result is zero, the data are accepted, otherwise they are rejected. 
Data unit at sender's side 
Coonoor oomoo J» oars wis 
10101001 of 8 bits 


00111001 


11100010 Sum 
00011101 checksum 





| 10101001 00111001 0001111 
e 











Checksum method 


e At receiver end, we break this data stream into three sections each of 8 
bits and perform steps according to algorithm at receiver's end. 


10101001<— íst 8 bits 
00111001<— 2nd 8 bits 


11100010 

00011101<— 3rd 8 bits 
11111111 Sum 
00000000<— Complement 


e All bits O (zero) indicates number error in data. 


Cyclic Redundancy Check (CRC) 

CRC is based on binary division. A sequence of redundant bits called CRC 
or the CRC remainder is appended to the end of a data unit, so that the 
resulting data unit becomes exactly divisible by a second, predetermined 
binary number. At its destination, the incoming data unit is divided by the 
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same number. If at this step there is no remainder, the data unit is 
assumed to be intact and therefore is accepted. 


e.g., Generate CRC code for a frame 100100, using the generator. 
G(x) =x? + x? 44 
G (x) = x9 + x? + Ox! + x° 
3 2 1 0 
1 1 O 1 


We do not have x' term that’s why we put a 0 for that. 








Quotient 





Data plus extra 
zeros, the number 
of zeros is one less 
than the number of 
bits in the divisor 






111101 


1101100100000 
11014 
1000 
1101. 
1010 
1101 
1110 
1101 
0110 
0000 
1100 
1101 


001 


Method of error detection using CRC method 










When the leftmost 
bit of the remainder 
is zero we must use 
0000 instead of 
the original divisor 





















Note Each bit of the divisor is subtracted from the corresponding bit of the 
dividend without distribing the next higher bit. 


In the above example, the divisor 1101, is subtracted from the first 4 bits of the 
dividend, 1001, yielding 001 (the leading 0 of the remainder is dropped off). 


So, here we get input data stream for receiver that is original data plus 
remainder i.e., CRC received. 

Input stream = 100100001 

Now, we perform CRC checker process. Here, we again perform the same 
modulo-2 division. If the remainder is all zeros, the CRC is dropped and 
data accepted. Otherwise, the received stream of bits is discarded and 
data is resent. 
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111101 


1101/100100001 







Data plus 
CRC Received 












41014 
1000 
1101. 
1010 
ton 
1110 
When the leftmost 1101 
bit of the remainder 0110 
is zero, we must use 





00004 
1101 
1101 


Method of error detection using CRC method 


all zeros 0000 instead 
of the original 
divisor 










We received remainder as all zeros, so the CRC will be dropped off and 
data (100100) will be accepted at receiver end. It shows that there is not 
error in data. 

Performance of CRC 

CRC is very effective error detection method, if the divisor is chosen 
according to the mentioned rule i.e., A polynomial should be selected to 
have atleast the following properties to be a divisor. 

e It should not be divisible by x e It should be divisible by (x + 1) 
The first condition guarantees that all burst errors of a length equal to the 
degree of the polynomial are detected. The second condition guarantees 
that all burst errors affecting an odd number of bits are detected. 


Error Correction 


Error correction in data link layer is implemented simply anytime, an error is 
detected in an exchange, a negative acknowledgement NAK is returned 
and the specified frames are retransmitted. This process is called 
Automatic Repeat Request (ARQ). Retransmission of data happens in three 
cases Error Control (Error correction) 

Damaged frame, 
Lost frame and 





Stop and wait Sliding window 
Lost acknowledgement. ARQ ARQ 
Y 
Go-back-n Selective reject 


Error correction classification 
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Stop and Wait ARQ 

Include retransmission of data in case of lost or damaged framer. For 
retransmission to work, four features are added to the basic flow control 
mechanism. 


-Key POINTS iiini 
+ The sending device keeps a copy of the last frame transmitted, until it receives 
an acknowledgement for that frame. 


+ For indentification purpose both data frames and ACK are numbered 
alternately 0 and 1. 


+ A data O frame is acknowledged by an ACK 1 frame, indicating that the 
receiver has gotten data 0 and now expecting data 1. 


e |f an error is discovered in a data frame, indicating that it has been 
corrupted in transit, a NAK frame is returned. NAK frames, which are not 
numbered, tell the sender to retransmit the last frame sent. 

e The sender device is equipped with a timer. If an expected 
acknowledgement is not received within an allotted time period, the sender 
assumes that the last data frame was lost in transmit and sends it again. 


Sliding Window ARQ 

To cover retransmission of lost or damaged frames, three features are 

added to the basic flow control mechanism of sliding window. 

e The sending device keeps copies of all transmitted frames, until they have 
been acknowledged. 

e |n addition to ACK frames, the receiver has the option of returning a NAK 
frame, if the data have been received damaged. NAK frame tells the 
sender to retransmit a damaged frame. Here, both ACK and NAK frames 
must be numbered for identification. ACK frames carry the number of next 
frame expected. NAK frames on the other hand, carry the number of the 
damaged frame itself. If the last ACK was numbered 3, an ACK 6 
acknowledges the receipt of frames 3,4 and 5 as well. If data frames 4 and 
5 are received damaged, both NAK 4 and NAK 5 must be returned. 

e Like stop and wait ARQ, the sending device in sliding window ARQ is 
equipped with a timer to enable it to handle lost acknowledgements. 
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“Key Points 2 
+ In sliding window ARQ, (n — 1) frames (the size of the window) may be sent 
before an acknowledgement must be received. 


+ If(n — 1) frames are awaiting acknowledgement, the sender starts a timer and 
waits before sending any more. 


+ If allotted time has runout, sender assumes that frames were not received and 
retransmit one or all frames depending on the protocol. 


Go-back-n ARQ In this method, if one frame is lost or damaged all 
frames sent, since the last frame acknowledged are retransmitted. 


Selective Reject ARQ In this method, only specific damaged or lost 
frame is retransmitted. If a frame is corrupted in transmit, a NAK is 
returned and the frame is resent out of sequence. The receiving device 
must be able to sort the frames it has and insert the retransmitted frame 
into its proper place in the sequence. 


Flow Control 


One important aspect of data link layer is flow control. Flow control refers to 
a set of procedures used to restrict the amount of data the sender can 
send before waiting for acknowledgement. 


Categories of flow control 





Stop and wait Sliding window 


Send one frame at a time Send several frame at a time 


Stop and Wait 


In this method, the sender waits for an acknowledgement after every frame 

it sends. Only when a acknowledgment has been received is the next frame 

sent. This process continues until the sender transmits an End of 

Transmission (EOT) frame. 

e We can have two ways to manage data transmission, when a fast sender 
wants to transmit data to a low speed receiver. 

e The receiver sends information back to sender giving it permission to 
send more data /.e., feedback or acknowledgment based flow control. 

e Limit the rate at which senders may transmit data without using feedback 
from receiver i.e., Rate based flow control. 
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Sender Receiver 
















































































Time Time 
Communication network between sender and receiver 


Advantages of Stop and Wait 
It's simple and each frame is checked and acknowledged well. 


Disadvantages of Stop and Wait 
e Itis inefficient, if the distance between devices is long. 


e The time spent for waiting ACKs between each frame can add significant 
amount to the total transmission time. 


Sliding Window 

In this method, the sender can transmit several frames before needing an 
acknowledgement. The sliding window refers imaginary boxes at both the 
sender and the receiver. This window can hold frames at either end and 
provides the upper limit on the number of frames that can be transmitted 
before requiring an acknowledgement. 


-Key POINTS iirinn 


+ The frames in the window are numbered modulo-n, which means they are 
numbered from 0 to n —1. For example, if n =8, the frames are numbered 
0, 1,2,3,4,5,6,7, 0, 1,2,3,4, 5,6, 7, 0,1... so on. The size of the window is 
(n — 1) in this case size of window = 7. 


+ In other words, the window can’t cover the whole module (8 frames) it covers 
one frame less that is 7. 


When the receiver sends an ACK, it includes the number of the next frame it 
expects to receive. When the receiver sends an ACK containing the 
number 5, it means all frames upto number 4 have been received. 
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Sender Sliding Window 


Size of sender window = 7, it means we are using modulo 8. 
Sender window 


fo sf2[sl4Ts[s} [ol *[2[s/4]5]s 


I—> Direction | Direction 


This wall moves to the This wall moves to the 
right when a frame is sent right when an ACK is received 


Sender sliding window 











Receiver Sliding Window 
Receiver window 
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-Key PONIES seesmine sss 


+ As each ACK is sent out, the receiving window expands to include as many 
new placeholders as newly acknowledged frames. The window expands to 
include. 

+ A number of new frame spaces = The number of most recently acknowledged 
frame — The number of previously acknowledged frame e.g., In a seven frame 
window. 

+ If the prior ACK was for frame 2 and the current ACK is for frame 5. The 
window expands by three (5 — 2). If the prior ACK was for frame 3 and the 
current ACK is for frame 1, the window expands by six(1+ 8 — 3). 


Medium Access Control Sublayer 


The protocols used to determine, who goes next on a multiaccess channel 
belong to a sublayer of the data link layer. This sublayer is called medium 
access control sublayer. 


Channel Allocation 


Static Channel Allocation FDM (Frequency Division Multiplexing) and 
TDM (Time Division Multiplexing) is used for this purpose. Let us focus on 
FDM, the mean time delay 7 for a channel having capacity C bit/sec, with 
an arrival rate of A frame/sec and each frame having a length drawn from 
an exponential probability density function with mean u bits/frame. Then, 





-pe—> 
If the channel is used by N independent subchannels. 


n ($) -a/N) We 





Dynamic Channel Allocation 

In these methods, the channel is allocated to a particular system 
dynamically j.e., no predetermined order of senders for accessing the 
channel in order to send data. 


Multiple Access Protocols 
These protocols can be divided into two categories 


ALOHA 
e Pure ALOHA e Slotted ALOHA 
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CSMA 


e 1-persistent CSMA e Non-persistent CSMA 
e p-persistent CSMA 


ALOHA Protocols 

ALOHA net, known as the ALOHA system, or simply ALOHA, was a 
pioneering computer networking system developed at the university of 
Huwai. It was designed for a radio wireless LAN, but it can be used on any 
shared medium. 


The ALOHA protocols can be of following types 


Pure ALOHA In this approach, a node, which wants to transmit will go 
ahead and send the packet on its broadcast channel with no consideration 
whatsoever as to anybody else is transmitting or not. One serious 
drawback here is that, we are not sure about whether the data has been 
received properly at the receiver end. To resolve this, the pure ALOHA, 
when one node finishes speaking, it expects an acknowledgement in a 
finite amount of time otherwise, it simply retransmits the data. 


-Key POINTS iirinn 


+ Pure ALOHA This scheme works well in small networks where the load is not 
high. 

+ In large load intensive networks where many nodes may want to transmit at 
the same time, the Pure ALOHA scheme fails miserably. 
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Time (shaded slots indicate collisions) g 

Slotted ALOHA This is quite similar to pure ALOHA, differing only in the 
way transmissions take place. Instead of transmitting right at demand time, 
the sender waits for some time. This delay is specified as follows the 
timeline is divided into equal slots and then it is required that transmission 
should take place only at slot boundaries. 
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To be more precise, the slotted ALOHA makes the following assumption 
All frames consists of exactly L bits. 
Time is divided into discrete frame time solts i.e., a slot equals the time to 


transmit one frame. 


Nodes start to transmit frames only at the beginning of slots. 
The nodes are synchronized, so that each nodes knows when the slot 


begin. 


If two or more frames collide in a slot, then all the node detect collision 


event before the slot ends. 
A 

Station 1 C] 

Station 2 | | | 

Station 3 

Station 4 

Station 5 


Station 6 

























‘Key Points ee ee 


+ Whenever multiple users have unregulated access to a single line, there is a 
danger of signals overlapping and destroying each other. Such overlaps, 
which turn the signals into unusable noise are called collisions. 

+ A LAN therefore needs a mechanism to coordinate traffic, minimize the 


number of collisions that oc 


cur and maximize the number of frames that are 


delivered successfully. The access mechanism used in an Ethernet is called 


CSMA/CD, standardized in | 


EEE 802.3. 


+ Switches are hardware and/or software devices capable of creating temporary 
connections between 2 or more devices linked to the switch but not to each 


other. 


Carrier Sense Multiple Access/Collision 
Detection (CSMA/CD) 

To minimize the chance of collision and, therefore increase the performance, 
the CSMA/CD was developed. The idea with carrier sense multiple access 
is to listen down the channels before a transmission takes place. 
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There is data from user to send 


Y 


Assemble a frame | Physical addresses are used 
(MAC address) 
Y 


Attempt <1 


Y 
Is some other 





station transmitting Recovered 
No Collision recovered 
i - Subalgorithm / 
Transmit 1st bit Backoff Algorithm 
of the frame Not 





4 recovered 
. Yes | Broadcast 
Collision detected? JAM Signal End 
y No 


Transmission 
finished? 













Transmit 
next bit of 
the frame 





Frame transmission 
failed (too many collisions) 





Frame transmitted 
successfully 
Flow diagram of CSMA/CD 


Persistent CSMA 


In this scheme, transmission proceeds immediately, if the carrier is idle. 
However, if the carrier is busy, then sender continues to sense the carrier 
until it becomes idle. The main problem here is that, if more than one 
transmitters are ready to send, a collision is guaranteed. In case of 
collision, stations wait for a random period of time before retransmission. 


A B C D 


Carrier 
sense 


All stations are listening down the channels 
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Multiple Ea 
access 


A and D accessing channel simultaneously 
to send data to C and B, respectively. 
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Collision detection (Back off Algorithm) 














There are three types of CSMA approaches as given following 


Non-persistent CSMA 


In this scheme, the broadcast channel is not monitored continuously. The 
sender listens it at a random time intervals and transmits whenever the 
carrier is idle. This decreases the probability of collisions. But it is not 
efficient in a low load situation, where the number of collisions are anyway 
small. The problems it entails are 

e |fback-off time is too long, the idle time of carrier is wasted in some sense. 
e |t may results in long access delays. 


p-Persistent CSMA 


Even if a sender finds the carrier to be idle, it uses a probablistic 
distribution to determine whether to transmit or not. Put simpley “Toss a 
coin to decide”. If carrier is idle, then transmission takes place with a 
probability p otherwise the sender waits with a probability (q = 1- p). 

This scheme is good trade-off between the non-persistent and 1-persistent 
schemes. So, for low load situations, p is high (e.g.,1-persistent) and for 
high load situations, p may be lower. 


Switching 


Whenever, we have multiple devices we have a problem of how to connect 

them to make one-to-one communication possible. Two solutions could be 

like as given below 

e Install a point-to-point connection between each pair of devices 
(Impractical and wasteful approach when applied to very large network). 

e For large network, we can go for switching. A switched network consists 
of a series of interlinked nodes, called switches. 


In a switched network, some of these nodes are connected to 
communicating devices. Others are used only for routing. 








Switching methods 
Circuit switching Packet switching Message switching 
Space Time Datagram Virtual 
division division approach circuit 
switching switching approach 


Classification of switching 


Circuit Switching 


It creates a direct physical connection between two devices such as 
phones or computers. In the given diagram, instead of point-to-point 
connections between the 3 computers on the left (A, B, C) to the 4 
computers on the right (D,E, F, G) requiring 12 links. We can use 4 
switches to reduce the number and the total length of links. So in figure 
computer, A is connected through switches |, II and Ill to computer D. A 
circuit switch is a device with n inputs and m outputs that creates a 
temporary connection between an input link and output link. 


Space division switching 
Separates the path in the circuit from each other spatially. 


Time division switching 
Uses time division multiplexing to achieve switching. 
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Time division switching network 


Circuit switching was designed for voice communication. In a telephone 
conversation. e.g., Once a circuit is established, it remains connected for 
the duration of the session. 


Disadvantages of Circuit Switching 


Less well suited to data and other non-voice transmissions. Non-voice 
transmissions tend to be bursty, meaning that data come in spurts with 
idle gaps between them. When circuit switched links are used for data 
transmission, therefore the line is often idle and its facilities wasted. 

A circuit switched link creates the equivalent of a single cable between 
two devices and thereby assumes a single data rate for both devices. This 
assumption limits the flexibility and usefulness of a circuit switched 
connection. 

Once a circuit has been established, that circuit is the path taken by all 
parts of the transmission, whether or not it remains the most efficient or 
available. 

Circuit switching sees all transmissions as equal. Any request is granted 
to whatever link is available. But often with data transmission, we want to 
be able to prioritise. 


Packet Switching 


To overcome the disadvantages of circuit switch. Packet switching concept 
came into the picture. 


In a packet switched network, data are transmitted in discrete units of 
potentially variable length blocks called packets. Each packet contains not 
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only data but also a header with control information (such as priority codes 
and source and destination address). The packets are sent over the 
network node to node. At each node, the packet is stored briefly, then 
routed according to the information in its header. 


There are two popular approaches to packet switching. 
(i) Datagram (ii) Virtual circuit 


Datagram Approach 


Each packet is treated independently from all others. Even when one 
packet represents just a piece of a multipacket transmission, the network 
(and network layer functions) treats it as though it existed alone. 


Packets in datagram approach technology are referred to as datagrams. 
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Virtual Circuit Approach 

The relationship between all packets belonging to a message or session is 
preserved. A single route is chosen between sender and receiver at the 
beginning of the session. When the data are sent, all packets of the 
transmission travel one after another along that route. 

We can implement it into two formats 

e Switched Virtual Circuit (SVC) e Permanent Virtual Circuit (PVC) 


SVC (Switched Virtual Circuit) 

This SVC format is comparable conceptually to dial-up lines in circuit 
switching. In this method, a virtual circuit is created whenever, it is needed 
and exists only for the duration of the specific exchange. 
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(c) Connection release 


PVC (Permanent Virtual Circuit) 
The PVC format is comparable to 
leased lines in circuit switching. In this 
method, the same virtual circuit is 
provided between two users on a 
continuous basis. The circuit is 
dedicated to the specific users. No one 
else can use it and because it is 
always in place, it can be used without 
connection establishment and 
connection termination. 


-Key Points ~~ 
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Permanent connection for the 
direction of the lease 


+ Whereas, two SVC users may get a different route every time, they request a 


connection. 


+ Two PVC users always get the same route. 
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Message Switching 


It is also known as store and forward. In this mechanism, a node 
receives a message, stores it, until the appropriate route is free, then sends 
it along. 

Store and forward is considered a switching technique because there is no 
direct link between the sender and receiver of a transmission. A message 
is delivered to the node along one path, then rerouted along another to its 
destination. 

In message switching, the massages are stored and relayed from 
secondary storage (disk), while in packet switching the packets are stored 
and forwarded from primary storage (RAM). 
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A store and forward switching network 


Routing and Network Security 


Routing is the process of selecting paths in a network along which to send 
network traffic. Routing is performed for many kinds of network, including 
the telephone network, electronic data networks and transportation 
networks. 

Routing can be grouped into two categories 

1. Non-adaptive routing 2. Adaptive routing 


Non-adaptive Routing 

Once the pathway to destination has been selected. The router sends all 
packets for that destination along that one route. In other words, the routing 
decisions are not made based on the condition or topology of the network. 


Adaptive Routing 

A router may select a new route for each packet (even packets belonging 
to the same transmission) in response to changes in condition and 
topology of the networks. 

For this purpose two common methods are used to calculate the shortest 
path between two routers 

1. Distance Vector Routing 2. Link State Routing 


Routing Algorithms 


There are many routing algorithms 
1. Distance vector Routing 2. Link State Routing 
3. Flooding 4. Flow based Routing 


Distance Vector Routing 

In this routing scheme, each router periodically shares its knowledge about 
the entire network with its neighbours. Each router has a table with 
information about network (ID, cost and the router to access the particular 
network). These tables are updated by enchanging information with the 
immediate neighbours. 


Link State Routing 


In link state routing, each router shares its knowledge of its neighbourhood 
with all routers in the network. 
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When a router floods the network with information about its neighbourhood, 
it is said to be advertising. The basis of this advertising is a short packed 
called a Link State Packet (LSP). An LSP usually contains 4 fields: the ID of 
the advertiser, the ID of the destination network, the cost and the ID of the 
neighbour router. 


Every router receives LSP and puts the information into a link state 
database. Because every router receives the same LSPs, every router 
builds the same database. But the shortest path trees and the routing 
tables are different for each router. Each router finds out its own shortest 
paths to the other routers by using Dijkstra’s algorithm. 


Flooding Algorithm 


It is a non-adaptive algorithm or static algorithm. When a router receives a 
packet, it sends a copy of the packet out on each line (except the one on 
which it arrived). 


To prevent form looping forever, each router decrements a hop count 
contained in the packet header. As soon as the hop count decrements to 
zero, the router discards the packet. 


Flow Based Routing Algorithm 


It is a non-adaptive routing algorithm. It takes into account both the 
topology and the load. In this routing algorithm, we can estimate the flow 
between all pairs of routers. 


Given the line capacity and the flow, we can determine the delay. It needs 
to use the formula for delay time. 
1 
E uc -À 


where, l = The mean packet size in the bits 
u 





à = Mean number of arrivals in packet/sec 
and c = Line capacity (bits/s) 


The Optimality Principal 


This simple states that if router J is on the optimal path form router / to 
router k, then the optimal path from J to K also falls along this same path. 
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Congestion Control 

When one part of the subnet (e.g., one or more routers in an area) 
becomes overloaded, congestions results. Because routers are receiving 
packet faster than they can forward them, one of the two must happen 

The subnet must prevent additional packets form entering the congested 
region, until those already present can be processed. 


The congested routers can discard queued packets to make room for 
those that are arriving. 


Factors that Cause Congestion 
a Packet arrival rate exceeds the outgoing link capacity 
= Insufficient memory to store arriving packets 
= Bursty traffic 
= Slow processor 


Congestion Control Techniques 
Several techniques can be employed for congestion control. These include 


1. Warning bit 
2. Choke packets These three deal with congestion 


3. Load shedding detection and recovery 


4. Random early n These two deal with 
5. Traffic shaping congestion avoidance 


Warning Bit 

A special bit in the packet header is set by the router to warn the source 
when congestion is detected. The bit is copied and piggy-backed on the 
ACK and sent to sender. 

The sender mentions the number of ACK (acknowledgment) packets, it 
receives with the warning bit set and adjusts its transmission rate accordingly. 


Choke Packets 
A choke packet is control packet generated at congested node and 
transmitted to restrict traffic flow. 


The source, one receiving the choke packet must reduce its transmission 
rate by a certain percentage. 


256 Computer Network 


Load Shedding 

When buffers become full routers simply discard packets. Which packet is 
chosen to be the victim depends on the application and on the error 
strategy used in data link layer. 


For a file transfer for e.g., we can’t discard older packets, since this will 
cause a gap in the received data. For real time voice or video, it is probably 
better to throw away old data and keep new packets. 


Random Early Discarded (RED) 

This is a proactive approach in which the router discards one or more 
packets before the buffer becomes completely full. Each time a packet 
arrives, the RED algorithm computes the average queue length, say AVG. 


-Key Pt Sn 

+ If AVG < some lower threshold, congestion is assumed to be minimal or 
non-existent and packet is queued. 

+ If AVG is greater than some upper threshold, congestion is assumed to be 
serious and the packet is discarded. 


+ If AVG is between the two thresholds, this might indicate the onset of 
congestion. The probability of congestions is then calculated. 


Traffic Shaping 

e Another method to congestion control is to shape the traffic before it 
enters the network. 

e It controls the rate at which packets are sent (not just how many). Used in 
ATM and integrated services networks. 

e At connections setup time, the sender and carrier negotiate a traffic 
pattern (shape). 
Two traffic shaping algorithms are as follows 

(i) Leaky Bucket (ii) Token Bucket 


The Leaky Bucket (LB) Algorithm 

The Leaky Bucket algorithm used to control rate in the network. It is 
implemented as single-server queue with constant service time. If the 
buffer (bucket) overflows, then packets are discarded. 
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(a) A leaky bucket with water (b) A leaky bucket with packets 
The leaky bucket enforces a constant output rate (average rate) regardless 
of the burstiness of the input. Does nothing when input is idle. 


When packets are of the same size (as in ATM cells), the host should inject 
one packet per clock tick onto the network. But for variable length packets, 
it is better to allow a fixed number of bytes per tick. 


Token Bucket (TB) Algorithm 


In contrast to LB, the token bucket algorithm, allows the output rate to vary 
depending on the size of the burst. 
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Key Points ~~~ 


+ Inthe TB algorithm, the bucket holds tokens. 
+ To transmit a packet, the host must capture and destroy one token. 
+ Tokens are generated by a clock at the rate of one token every At sec. 


+ Idle hosts can capture and save up tokens (upto the maximise size of the 
bucket) in order to send larger bursts later. 
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Protocols used in Network Layer 


Some of the important protocols used in network layer with their functions 
are given below 


Address Resolution Protocol (ARP) 


ARP is used to find the physical address of the node when its Internet 
address is known. Anytime, a host or a router needs to find the physical 
address of another has on its network, it formats an ARP query packet that 
include that IP address and broadcasts it over the network. Every host on 
the network receives and processes the ARP packet, but the intended 
recipient recognises its Internet address and sends back its physical 
address. 


Reverse Address Resolution Protocol (RARP) 


This protocol allows a host to discover its Internet address when it known 
only its physical address. RARP works much like ARP. The host wishing to 
retrieve its Internet address broadcasts an RARP query packet that 
contains its physical address to every host of its physical network. A server 
on the network recognises the RARP packet and return the hosts Internet 
address. 


Internet Control Massage Protocol (ICMP) 


The ICMP is a mechanism used by hosts and routers to send notifications 
of datagram problems back to the sender. IP is essentially an unreliable 
and connectionless protocol. ICMP allows IP (Internet Protocol) to inform a 
sender, if a datagram is undeliverable. 

ICMP uses each test/reply to test whether a destination is reachable and 
responding. It also handles both control and error massages but its sole 
function is to report problems not correct them. 


Internet Group Massage Protocol (IGMP) 


The IP can be involved in two types of communication unicasting and 
multicasking. The IGMP protocol has been designed to help a multicast 
router to identify the hosts in a LAN that are members of a multicast group. 
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Addressing at Network Layer 


In addition to the physical addresses that identify individual devices, the 
Internet requires an additional addressing connection an address that 
identifies the connection of a host of its network. Every host and router on 
the Internet has an IP address which encodes its network number and host 
number. The combination is unique in principle, no 2 machines on the 
Internet have the same IP address. 


Classes and Subnetting 


There are currently five different field length pattern in use, each defining a 
class of address. 


An IP address is 32 bit long. One portion of the address indicates a network 
(Net ID) and the other portion indicates the host (or router) on the network 
(i.e., Host ID). 


To reach a host on the Internet, we must first reach the network, using the 
first portion of the address (Net ID). Then, we must reach the host itself, 
using the 2nd portion (Host ID). 


The further division a network into smaller networks called subnetworks. 


k— Byte1—>|<— Byte2—>|«— Byte3—>|<— Byte4—>| 
ClassA | Net ID Host ID | 


Class B Net ID Host ID 


Class C | Net ID HostID | 


Class D Multicast Address 


Class E | Reserved for future use | 


























For Class A First bit of Net ID should be 0 like in following pattern 
01111011. 1000111111111100 -11001111 


For Class B First 2 bits of Net ID should be 1 and 0 respective, as in below 
pattern 10011101 - 10001111 -11111100- 11001111 


For Class C First 3 bits Net ID should be 1,1 and 0 respectively, as follows 
11011101 - 10001111 -11111100- 11001111 


For Class D First 4 bits should be 1110 respectively, as in pattern 
11101011 - 10001111 -11111100-11001111 


For Class E First 4 bits should be 1111 respectively, like 
11110101 10001111 -11111100- 11001111 
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Class Ranges of Internet Address in Dotted Decimal Format 


Class A 


Class B 


Class C 


Class D 


Class E 


Computer Network 








From 
0-0-0-0 127-255 -255- 255 
<—+>|<-Host ID>| K—>|<-Host ID—> 
Net ID Net Id 
128-0-0-0 191-255 255-255 





<—>|<—Host ID—> 





Net ID 


k—>|<-Host ID 
Net Id 





192-0-0-0 


[223 255 255: 255] 





I<——> Host IDH 
Net ID 


arl 
NetID Host ID 





224-0-0-0 


240-0:0-0 





|«Group Address 





Three Levels of Hierarchy 


Adding subnetworks creates an intermediate level of hierarchy in the IP 
addressing system. Now, we have three levels: net ID; subnet ID and host 


ID. e.g., 


<— Undefined —> 


239- 0-0-0 
|<«Group Address->| 


255- 255 255. 255 


|x— Undefined —> 





|< Erara Id>| [2 
Subnet iD. oat iD 
|x«—— Subnetwork ——>| |<—Host—>| 
access access 
141-14-2-21 
141-14-7-105 


141:14-2-20 


To the 
rest of the 


network S 
141-14-22-64 


Network 
141-14-0-0 


141:14:22-8 




















141.14.7.95 


141-14-7-44 


A network with two level of hierarchy (not subnetted) 


141:14-7-96 
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Subnetwork 


A network with three level of hierarchy (subnetted) 


Masking 


Masking is process that extracts the address of the physical network form 
an IP address. Masking can be done whether we have subnetting or not. If 
we have not subnetted the network, masking extracts the network address 
form an IP address. If we have subnetted, masking extracts the subnetwork 
address form an IP address. 


Masks without Subnetting 
To be compatible, routers use mask even, if there is no subneting. 


141-14-2-21 Mask 141-14-0-0 
IP address 255-255-0-0 | Network address 














Masks with Subnetting 


When there is subnetting, the masks can vary 


141-14-2-21 Mask 141-14-2-0 
IP address | 255-255-255-0] Network address 
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Masks for Unsubnetted Networks 




















Class Mask Address (Example) | Network Address 
A | 255-0-0-0 15-32-56-7 15-0-0-0 
B | 255-255-0-0 135-67-13-9 135-67-0-0 
C | 255-255: 255-0 201- 34-12-72 201: 34-12-0 
D N/A N/A N/A 
E N/A N/A N/A 
Masks for Subnetted Networks 
Class Mask Address (Example) Subnetwork Address 
A 255 -255-0-0 15-32 +567 15-32-0-0 
B 255+ 255+ 255-0 135-67-13-9 135-67-13-0 
C 255+ 255-255-192 201- 34-12-72 201: 34-12-0 
D N/A N/A N/A 
E N/A N/A N/A 











Types of Masking 


There are two types of masking as given below 


Boundary Level Masking 

If the masking is at the boundary level (the mask numbers are either 255 or 

0), finding the subnetwork address is very easy. Follow these 2 rules 

e The bytes in IP address that correspond to 255 in the mask will be 
repeated in the subnetwork address. 


e The bytes in IP address that correspond to 0 in the mask will change to O 
in the subnetwork address. 





e.g., IP address 45-23 -21-8 
Mask 255-255-0 -0 
Subnetwork 45-23 -0 -0 
address 


Non-boundary Level Masking 

If the masking is not at the boundary level (the mask numbers are not just 

255 or 0), finding subnetwork address involves using the bit-wise AND 

operator. follow these 3 rules 

e The bytes in IP address that correspond to 255 in the mask will be 
repeated in the subnetwork address. 
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e The bytes in the IP address that correspond to O in the mask will be 
change to 0 in the subnetwork address. 


e For other bytes, use the bit-wise AND operator 





e.g., IP address 213. 23 - 47. 37 
Mask 255. 255 - 255 - 240 
Subnetwork 213. 23 - 47. 32 
address 


As we can see, 3 bytes are easy to determine. However, the 4th bytes 
needs the bit-wise AND operation. 


37 00100101 
240 11110000 
32 00100000 








Transport Layer Protocols 


There are two transport layer protocols as given below 


UDP (User Datagram Protocol) 


UDP is a connectionless protocol. UDP provides a way for application to 
send encapsulate IP datagram and send them without having to establish 
a connection. 


UDP transmitted segments consisting of an 8 byte header followed by the 
payload. The two parts serve to identify the end points within the source 
and destinations machine. When UDP packets arrives, its payload is 
handed to the process attached to the destination ports. 





Source Port Address (16Bits) Destination Port Address (16 Bits) 





Total length of the User Checksum (used for error detection) 
Datagram (16 Bits) (16 Bits) 














UDP datagram format 


TCP (Transmission Control Protocol) 


TCP provides full transport layer services to applications. TCP is reliable 
stream transport port-to-port protocol. The term stream, in this context, 
means connection-oriented, a connection must be established between 
both ends of transmission before either may transmit data. By creating this 
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connection, TCP generates a virtual circuit between sender and receiver 
that is active for the duration of transmission. 


Source port address Destination port address 
16 Bits 16 Bits 


Sequence number (32 Bits) 


ae Number 2 Bits ) 


HLEN Reserved c ai ai Y Window size 
4 Bits 6 Bits HI TIN i 16 Bits 


Cheeksum 16 T Urgent Pointer 16 Bits 
Options and Padding 














TCP segment format 


Each machine supporting TCP has a TCP transport entity either a library 
procedure, a user process or port of kernel. In all cases, it manages TCP 
streams and interfaces to the IP layer. A TCP entities accepts the user data 
stream from local processes, breaks them up into pieces not exceeding 
64 K bytes and sends each piece as separate IP datagrams. 


$ Key POINGS -erene aaeoa et eee 
+ When a datagram containing TCP data arrive at a machine, they are given to 
the TCP entity, which reconstruct the original byte stream. 
+ All TCP connections are full duplex and point-to-point and TCP connection is 
a byte stream not message steam. 


Domain Name System (DNS) 


To identify an entity, TCP/IP protocol uses the IP address which uniquely 
identifies the connection of a host to the Internet. However, people refer to 
use names instead of address. Therefore, we need a system that can map 
a name to an address and conversely an address to name. In TCP/IP, this 
is the domain name system. 


DNS in the Internet 

DNS is protocol that can be used in different platforms. Domian name 

space is divided into three categories. 

e Generic Domain The generic domain defines registered hosts 
according, to their generic behavior. Each node in the tree defines a 
domain which is an index to the domain name space database. 
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Generic domain 


e Country Domain The country domain section follows the same format as 
the generic domain but uses 2 characters country abbreviations @.g., US 
for United States) in place of 3 characters. 


e Inverse Domain The inverse domain is used to map an address to a name. 


Application Layer Protocols 


There are various application layer protocols as given below 


SMTP (Simple Mail Transfer Protocol) 


One of the most popular network service is electronic mail (e-mail). The 
TCP/IP protocol that supports electronic mail on the Internet is called 
Simple Mail Transfer Protocol (SMTP). 


SMTP is system for sending messages to other computer. Users based on 
e-mail addresses. SMTP provides services for mail exchange between 
users on the same or different computers. 


TELNET (Terminal Network) 


TELNET is client-server application that allows a user to log onto remote 
machine and lets the user to access any application program on a remote 
computer. 

TELNET uses the NVT (Network Virtual Terminal) system to encode 
characters on the local system. On the server (remote) machine, NVT 
decodes the characters to a form acceptable to the remote machine. 


FTP (File Transfer Protocol) 

FTP is the standard mechanism provided by TCP/IP for copying a file from 
one host to another. 

FTP differs form other client-server applications because it establishes 
2 connections between hosts. One connection is used for data transfer, the 
other for control information (commands and responses). 
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Multipurpose Internet Mail Extensions (MIME) 
It is an extension of SMTP that allows the transfer of multimedia messages. 


POP (Post Office Protocol) 


This is a protocol used by a mail server in conjuction with SMTP to receive 
and holds mail for hosts. 


HTTP (Hypertext Transfer Protocol) This is a protocol used mainly to 
access data on the World Wide Web (www) , a respository of information 
spread all over the world and linked together. 


The HTTP protocol transfer data in the form of plain text, hyper text, audio, 
video and so on. 


Important Terms Related to Computer Security 


1 
! | 
| = Computer Security It is a generic name of the collection of tools designed to 
ı protect data and thwart hackers. 
i = Network Security It measures to protect data during their transmission. 
= Internet Security It measures to protect data during their transmission over a | 
ı collection of interconnected networks. l 
= Information Security We consider 3 aspects of information security i 
| (a) Security attack (b) Security mechanism (c) Security service 


Security Services 


These are two main categories of security services 


RFC 2828 It defines as a processing or communication service provided 
by a system to give a specific kind of protection to system resources. 


X.800 It defines as a service provided by a protocol layer of 
communicating open systems, which ensures adequate security of the 
systems or of data transfers. 


X.800 Classification 


X. 800 defines it in five major categories 
= Authentication Assurance that the communicating is the one claimed. 


= Data Confidentiality Protection of data from unauthorised disclosure. 
= Data Integrity Assurance that data received is as sent by as authorised entity. 


= Non-Repudiation Protection against denial by one of the parties in a 
communication. 


1 I 
1 i 
1 i 
1 Li 
1 Li 
1 1 
1 ' 
i 1 
. . i 

ı = Access Control Prevention of the unauthorised use of resources. 1 
1 

i Li 
1 Li 
1 i 
1 i 
1 Li 
1 Li 
1 Li 
1 Li 
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Security Mechanism 


Feature desisgned to detect, prevent or recover from security attack. 
The two categories of security mechanism are as follows 


Specific Security Mechanism 

Encipherment, digital signature, access control, data integrity, 
authentication exchange, traffic padding, routing control, notarisation. 
Pervasive Security Mechanism 


Treated functionality, security labels, event detection, security audit trails, 
security recovery. 


Security Attacks 

It is an action that compromises the security of information owned by an 
organisation. 

These are two types of security attacks 


Passive Attacks 
Eavesdropping on or monitoring of transmission to 
e Obtain message contents e Monitor traffic flows 


Active Attacks 


In computer and computer netwoks an attack is any attempt to destroy, 
expose, after, disable, steal or gain unauthorized access to or make 
unauthorized use of an asset. 


Modification of data stream to 

e Masquerade of one entity as some other 
e Replay previous messages 

e Modify messages in transit 

e Denial of service 


Security attacks 


Active attacks Passive attacks 
Masquerade Replay Denial Modification Traffic Release 
of of message analysis of message 
Services contents contents 


Security attacks classification 
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Model for Network Security 


Network security starts with authenticating, commonly with a username 
and password. since, this requires just one detail authenticating the 
usename j.e., the password this is some times teamed one factor 
authentication. 


Trusted third party 
(e.g., Arbiter distributer etc.) 


oe 















































Sender Receiver 
Security related information Security related 
œ | transformation © h o | transformation | œ 
2 e o channel |o > © 
D >| 28 Sie ( D 
o 8a Ba LL) an) 
o He Hoe o 
= t £ = t = 
Secret Secret 
information Opponent information 


A general Netwok Security Model (NSM) 


Using this model require us to 

e Design a suitable algorithm for the security transformation. 

e Generate the secret in formations (keys) used by the algorithm. 
e Develop methods to distribute and share the secret information. 


e Specify a protocol enabling the principles to use the transformation and 
secret information for security service. 


Cryptography 


It is a science of converting a stream of text into coded form in such a way 
that only the sender and receiver of the coded text can decode the text. 


Now a days, computer use requires automated tools to protect files and 
other stored information. Use of network and communication links require 
measures to protect data during transmission. 


Symmetric/Private Key 

Cryptography (Conventional/ Private key/Single key) 

Symmetric key algorithms are a class of algorithms fo cryptography that 

use the same cryptographic key for both encryption of plaintext and 

decryption of ciphertext. The may be identical or there may be a simple 

transformation to go between the two keys. 

In symmetic private key cryptography the following key features are involved 

e Sender and recipient share a common key. 

e It was only prior to invention of public key in 1970. 

e |f this shared key is disclosed to opponent, communications are 
compromised. 


e Hence, does not protect sender form receiver forging a message and 
claiming is sent by user. 


Secret key Secret key 
shared by shared by 
sender and sender and 
BES a 

Transmitted 

ciphertext > 
Plain text Encryption Decryption algorithm Plain text 
input algorithm (reverse of encryption output 
(e.g., DES) algorithm) 


Symmetric cipher model (used in symmetric encryption) 


Asymmetric/Public Key Cryptography 


A public key cryptography refers to a cyptogaphic system requiring two 
separate keys, one of which is secrete/private and one of which is public 
although different, the two pats of the key pair are mathematically linked. 
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e Public Key A public key, which may be known by anybody and can be 
used to encrypt messages and verify signatures. 

e Private Key A private key, known only to the recipient, used to decrypt 
messages and sign (create) signatures. 


It is symmetric because those who encrypt messages or verify 
signature cannot decrypt messages or create signatures. It is 
computationally infeasible to find decryption key knowing only algorithm 
and encryption key. 


Either of the two related keys can be used for encryption, with the other 
used for decryption (in some schemes). 





Ted 

Alice's i me s 

public key Mis key 

Transmitted 
ciphertext 7 
Plain text Encryption Cole algorithm Plain text 
input algorithm (reverse of encryption output 
(e.g., DES) algorithm) 


Asymmetric cipher model 
In the above public key cryptography mode 
e Bob encrypts a plaintext message using Alice’s public key using 
encryption algorithm and sends it over communication channel. 


e On the receiving end side, only Alice can decrypt this text as she only is 
having Alice’s private key. 


Message Authentication Codes (MAC) 


In cryptography, a Message Authentication Code (MAC) is a short piece of 
information used to authenticate a message and to provide integrity and 
authenticity assurance on the message. Integrity assurance detects 
accidental and international message changes, while authenticity 
assurance affirms the message’s origin. 


A keyed function of a message sender of a message m computers MAC 
(m) and appends it to the message. 

Verification The receiver also computers MAC (m) and compares it to the 
received value. 
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Security of MAC An attacker should not be able to generate a valid 
(m, MAC (m)), even after seeing many valid messages MAC pairs, possible 
of his choice. 


MAC from a Block Cipher 

MAC from a block cipher can be obtain ned by using the following 
suggestions 

e Divide a massage into blocks. 

e Compute a checksum by adding (or xoring) them. 

e Encrypt the checksum. 


-Key POINTS iiini 


+ MAC keys are symmetric. Hence, does not provide non-repudiation 
(unlike digital signatures). 

+ MAC function does not need to be invertiable. 

+ A MACed message is not necessarily encrypted. 


RSA Algorithm 


RSA is an algorithm for public key cryptography RSA (Rivest Shamir 
Adleman) algorithm was publicly descibed in 1977. 


Mathematical Background of RSA Algorithm 


Extended Eucledian algorithm Given x, find y, such that x- y=1mod m. 
The extended Eucledian algorithm can efficiently find the solution to this 
problem. 


Euler’s Theorem For any number, a relatively prime to 
n= pg,a®~ 0-1 —4mod pq 
(i) Why this is very useful? 
(ii) Let Z = k(p — 1) (q - 1) + r, we have 
a? =akP-NG-N xa ...= a" mod pg 
(iii) In other words, If z = r mod (p - 1) (g — 1), then a* = a" mod pq 
Special case If z =1mod (p - 1) (q —1), then a? = a mod pq 


We can use Euler’s theorem to simplify a* mod pq 
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RSA Algorithm 
(i) Letn = pq, where pandq are 2 large primes. 
(ii) Public key (e, n), where e is relative prime to 
(0-1 (9-1 
(iii) Private key (d, n), such that 
ed =1mod (p - 1)- (q - 1) 
d can calculated using extended Euculidian Algorithm 

Encryption c = m? mod n 

Decryption c” = (m°) = m® mod n 
Security of RSA dependes on the hardness of factoring. 
factoring n = p xq is hard when n is large. 


DES (Data Encryption Standard) 


The data encryption standard was developed in IBM. DES is a symmetric key crypto 
system which has a 56 bit key and encrypts 64 bit plaintext to 64 bit cipher texts . To 
improve DES, the concept of 3 DES was introduced. 


Compiler Design 


Compiler 


A compiler is a program written in one language (/.e., source language) 
and translate it into an equivalent program in a target language. 


This translation process could also report the errors in the source system, if 
any. 


If no errors in 







{Target program} 
source program 





Source ; 
——» |Compiler 


program 











If errors are present 
$ {Error messages} 
in source program 


Diagrammatic view of working of a compiler 


Major Parts of a Compiler 
There are two major parts of a compiler 
1. Analysis 2. Synthesis 


program. Lexical analyzer, syntax analyzer and semantic analyzer are phases in this 
part. 

= In synthesis phase, the equivalent target program is created from this intermediate 
representation. Intermediate code generator, code generator and code optimizer 


[i 
1 
1 
1 
l 
1 
` ` . ` . . 1 
= In analysis phase, an intermediate representation is created from the given source | 
1 
1 
1 
1 
1 
1 
i 
are phases in this part. i 
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Phases of a Compiler 


Each phase shown in the figure (phases of a compiler) transform from one 
representation into another representation. They communicate with error 
handlers and the symbol table. 


Lexical Analyzer 


Lexical analyzer reads the source program character by character and 
returns the tokens of the source program. It puts information about 
identifiers into the symbol table. 


Some typical task of lexical analyzer includes recognising reserved 
keywords, ignoring comments, binding integer and floating point 
constants. Counting the number of lines, finding identifiers (variables), 
finding string and character constants, reporting error messages and 
treating white spaces appropriately in the form of blank, tab and new line 
characters. 


{Source program} 












Lexical analyzer 


Syntax analyzer 
Semantic analyzer 


Analysis phase 




































Error 
handler 


Symbol 
table manager 








Intermediate code 
generator 


Code optimiser 
Code generator 


Synthesis phase 


























{Target program} 


Phases of a compiler 
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Tokens, Lexemes and Patterns 

e A token describes a pattern of characters having same meaning in the 
source program such as identifiers, operators, keywords, numbers, 
delimiters and so on. A token may have a single attribute which holds the 
required information for that token. For identifiers, this attribute is a pointer 
to the symbol table and the symbol table holds the actual attributes for 
that token. 

e Token type and its attribute uniquely identify a lexeme. 

e Regular expressions are widely used to specify pattern. 


Note A token can represent more than one lexeme, additional information 
(ie, attribute of the token) should be held for that specific lexeme. 


Specification of Tokens 

Regular expressions are an important notation for specifying patterns. 
Each pattern matches a set of strings, so regular expressions will serve as 
names for sets of string. 


Recognition of Tokens 

Lexical analyzer are based on a simple computational model called as the 
finite state automata. 

Transition diagram depicts the actions that take place when a lexical 
analyzer is called by the parser to get the next token. 


Syntax Analyzer (Parser) 

Syntax analyzer creates the syntactic structure of the given source 
program. This syntactic structure is mostly a parse tree. The syntax of a 
programming is described by a Context-Free Grammar (CFG). We will 
use BNF (Backus-Naur Form) notation in the description of CFGs. 


The syntax analyzer (parser) checks whether a given source program 
satisfies the rules implied by a context-free grammar or not. If it satisfies, 
the parser creates the parse tree of that program. Otherwise the parser 
gives the error messages. 


Rey Pitt Sanna 


+ A context-free grammar gives a precise syntactic specification of a 
programming language. 

+ The design of the grammar is an initial phase of the design of a compiler. 

+ A grammar can be directly converted into a parser by some tools which 
works on stream of tokens and the smallest item. 

+ Parser sends get next token command to lexeme to identify next token. 


2/6 Compiler Design 


7 Token 
Source Lexical > P Parse 
program ™| analyzer |4------------- atid tree 
Get next 


token 





Syntax analyzer block diagram 


We categorise the parser into two groups 
(i) Top-down parser (starts from the root). 
(ii) Bottom-up parser (starts from the leaf). 
e Both top-down and bottom-up parsers scan the input from left-to-right 
(one symbol at a time). 
e Efficient top-down and bottom-up parsers can be implemented only for 
subclasses of context-free grammars. 
(i) LL for top-down parsing 
(ii) LR for bottom-up parsing 


Context-Free Grammars 

Inherently recursive structures of a programming language are defined by 
a CFG. Ina CFG, we have A start symbol (one of the non-terminals). A finite 
set of terminals (in our case, this will be the set of tokens). A set of 
non-terminals (syntactic variables). 

A finite set of production rules in the following form 


A => q&a, where A is non-terminal and « is a string of terminals (including the 
empty string). 


Derivations 
It refers to replacing an instance of a non-terminal in a given string’s non 
terminal by the right hand side of production rule, whose left hand side 
contains the non-terminal to be replaced. Derivation produces a new string 
from a given string. 
If the derived string contained only terminal symbols, then no further 
derivation is possible. 
E =-E, is read as “E derives — E”. 

Each derivation step needs to choose a non-terminal to rewrite and a 
production rule to apply. A left most derivation always chooses the left 
most non-terminal to rewrite. 

e.g., > EsE+E sid +E => id + id 
A right most derivation always chooses the right most non-terminal to 
rewrite. 


e.g., EsE+E >E +id > id + id 
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Parse Trees 


Graphical representation for a derivation that filters out the order of 
choosing non-terminals to avoid rewriting. The root node represents the 
start symbol, inner nodes of a parse tree are non-terminal symbol. 


The leaves represent terminal symbols 


e: Pax 


=>- (E) PN 
=>- (E+E) ( | ) 
>- (id +E) Pa | N 
> -lid + id) E 4 E 
Left most derivation | | 
Ambiguity Parse tree diagram 


A grammar produces more than one parse tree for a sentence is called as 
ambiguous grammar. Unambiguous grammar refers unique selection of 
the parse tree for a sentence. 

We should eliminate the ambiguity in the grammar during the design 
phase of the compiler. Ambiguous grammars can be disambiguated 
according to the precedence and associativity rules. 


e.g., Consider the grammar 
F>EF+E|E*E | id 


If we start with the E > E + E, then the parse tree will be created as follows 


Fix 


E E 

|. awe |S 

aa 
id id 
Parse tree 


If we start with the E = E * E, then the parse tree will be created as 


follows 
E 
wis 


E Ẹ 

| | 

“e 
id id 
Parse tree 


Note The above grammar is ambiguous as we are getting more than one 
parse tree for it. 
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Removing Ambiguity 
Consider an example of ambiguous grammar as given below 
FoE+E/E*E\|E*E|id| © 
We can use precedence of operators as follows 
^A (right to left) 
* (left to right) 
+ (left to right) 


Using the above operator precedence and associativity, we get the 
following unambiguous grammar 


E>E+T|T 
T>T*F|F 
F>G^FJ|G 
G id | (6) 
Left Recursion 


A grammar is left recursive, if it has a non-terminal A such that there is a 
derivation. 


A=Aa for some string a 


Top-down parsing technique can’t handle left recursive grammars. So, we 
have to convert our left recursive grammar into an equivalent grammar 
which is not left recursive. 


The left-recursion may appear in a single step of the derivation (immediate 
left recursion) or may appear in more than one step of the derivation. 


Left Factoring 


A predictive parser (a top-down parser without backtracking) insists that 
the grammar must be left factored. 


grammar — a new equivalent grammar suitable for predictive parsing. 
stmt — if expr then stmt else stmt | if expr then stmt 


When we see, if we can’t know which production rule is to be chosen then 
rewrite stmt in the derivation. 


In general, 
A >a p4 | a Ba, 


where a is not empty and the first symbols of B, and B, (if they have one) 
are different. 
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When processing œ, we can’t know whether expand 
A toa B, or 
Atoa Bo 

But, if we rewrite the grammar as follows 
A> aA! 


A'->B, 1| B>, So we can immediately expand A to aA’. 


YACC 

YACC generates C code for a syntax analyzer or parser, YACC uses 
grammar rules that allow it to analyze token from LEX and create a syntax 
tree. A syntax tree imposes a hierarchical structure tokens. e.g., operator 
precedence and associativity are apparent in the syntax tree. 

YACC takes a default action when there is a conflict. For shift reduce 
conflicts, YACC will shift. For shift reduce conflict, it will use the first rule in 
the listing. It also issues a warning message whenever a conflict exists. 


Rey Pint Sanaa aaa 


+ The warning may be suppressed by making the grammar unambiguous. 


+ The definition S section consists of token declarations and C code bracketed 
by “% {and ” %}%”. 


+ The BNF grammar is placed in the rules section and the user subroutines are 
added in the subroutines section. 


+ Input to YACC is divided into three sections 
... definitions ... 


%% 

... rules ... 

%% 

.. subroutines ... 


Top-down Parsing 

There are two main techniques to achieve top-down parse tree 
(i) Recursive descent parsing 
(ii) Predictive parsing 

e Recursive predictive parsing 

e Non-recursive predictive parsing 
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Recursive Descent Parsing (Uses Backtracking) 
Backtracking is needed (if a choice of a production rule doesn not work, we 
backtrack to try other alternatives). It tries to find the left most derivation. It 
is not efficient. 
e.g., If the grammar is 

S > aBc 

B — bc|b and the input is abc 

S S S 

J/\\N 2 DET 
a B (0 N a B c 
b c | 


In case failure, it backtracks to generate the new one 


Predictive Parser 


Grammar > A grammar suitable for predictive 





Eliminate ? Left 
left recursion factor 


parsing a LL(1) grammar (no 100% guarantee). 


When re-writing a non-terminal in a derivation step, a predictive parser 
can uniquely choose a production rule by just looking the current 
symbol in input string. 

eD, stmt => if - - - - - | 


e When we are trying to write the non-terminal stmt, we have to choose first 
production rule. 

e When we are trying to write the non-terminal stmt, we can uniquely choose 
the production rule by just looking the current token. 

e We eliminate the left recursion in the grammar and left factor it. But it may 
not be suitable for predictive parsing (not LL (1) grammar). 

Recursive Predictive Parsing Each non-terminal corresponds to a procedure. 


e.g., A — aBb|bAB 
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Proc A 
{ 
case of the current token 
{ 
‘a’ - match the current token with a and 
move to the next token; 
- call B; 
- match the current token with b and move to 
the next token; 
‘b’-match the current token with b and move 
to the next token; 
- call ‘A’; 
- call ‘B’; 
} 
} 


Non-recursive Predictive Parsing (LL (1) Parser) 
e Non-recursive predictive parsing is a table driven parser. 


e Itis a top-down parser. e Itis also known as LL (1) parser. 
Input buffer 
Stack ———_»> Non-recursive ———> Output 


predictive parser 


t 


Parsing table 


Processing of non-recursive predictive parsing 
Input buffer Our string to be parsed. We will assume that its end is marked 
with a special symbol $. 
Output A production rule representing a step of the derivation sequence 
(left most derivation) of the string in the input buffer. 
Stack Contains the grammar symbol. At the bottom of the stack, there is a 
special end marker symbol $. Initially the stack contains only the symbol $ 
and the starting symbol S ($S < initial stack). When the stack is emptied 
(.e., only $ left in the stack), the parsing is completed. 


Parsing table 


e A two-dimensional array M [A, a]. 

e Each row is anon-terminal symbol. 

e Each column is a terminal symbol or the special symbol $. 
e Each entry holds a production rule. 
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Parser Actions 


The symbol at the top of the stack (say X) and the current symbol in the 
input string (say a ) determine the parser action. 
There are four possible parser actions 
(i) If X and a are $ — parser halts (successful completion). 
(ii) If X and a are the same terminal symbol (different from $). 
Parser pops X from the stack and moves the next symbol in the input 
buffer. 
(ili) If X is a non-terminal. 
Parser looks at the parsing table entry M [X, a]. If M[X, a] holds a 
production rule X > Y4, Yo,..., Yg, it pops X from the stack and pushes 
Yo Yg- 1: Y4 into the stack. The parser also outputs the production 
rule X + Y4 Yo..., ¥, to represent a step of the derivation. 
(iv) None of the above —> error 
All empty entries in the parsing table are errors. 
If X is a terminal symbol different from a, this is also an error case. 


Functions used in Constructing LL (1) Parsing Tables 

e Two functions are used in the construction of LL (1) parsing tables -FIRST 
and FOLLOW. 

e FIRST (œ) is a set of the terminal symbols which occur as first symbols in 
strings derived from œ, where a is any string of grammar symbols. If œ 
derives to e, then € is also in FIRST (a). 

e FOLLOW (A) is the set of the terminals which occur immediately after 
(FOLLOW) the non-terminal A in the strings derived from the starting 
symbol. 

e A terminal ais in FOLLOW (A), if S => &Aa f 

e $is in FOLLOW (A), if S >aA 


To Compute FIRST of any String X 

e IfXis a terminal symbol — FIRST (X) = {X} 

e IfXis anon-terminal symbol and X —> £ is a production rule > £ is in FIRST (x). 
e IfXis anon-terminal symbol and X > Yj, Yo,...., Yp is a production rule. 

If a terminal a in FIRST (Y;) and € is in all FIRST (Y;) for j =1, ..., i — 1, then a 
is in FIRST (X). 
If eis in all FIRST (Y;) for j =1,... n, then is in FIRST (x). 
e IfX ise, then FIRST (X) = {€} 

© lfXis Y4,Yo,... Yn 
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If a terminal a in FIRST (¥;) and € is in all FIRST (Y;) for j =1,...i — 1, then a 
is in FIRST (X). 


If € is in all FIRST (Y;) for j =1,...n, then € is in FIRST (X). 
To compute FOLLOW (for Non-terminals) 
If S is the start symbol, $ is in FOLLOW (S). 

e If A —aBB is a production rule, then everything in FIRST (B) is FOLLOW 
(B) except €. 

e |f(A—oB is a production rule) or (A— a is a production rule and € is in 
FIRST (B)) then everything in FOLLOW (A) is in FOLLOW (B). 

e Apply these rules until nothing more can be added to any FOLLOW set. 


Bottom-up Parsing Techniques 


A bottom-up parser creates the parse tree of the given input string from 
leaves towards the root. A bottom-up parser tries to find the right most 
derivation of the given input in the reverse order. 
(i) S >- - -> o (the right most derivation of œ). 
(ii) < (the bottom-up parser finds the right most derivation in the 
reverse order). 
Bottom-up parsing is also known as shift reduce parsing. 


Shift Reduce Parsing 


A shift reduce parser tries to reduce the given input string into the starting 
symbol. At each reduction step, a substring of the input matching to the 
right side of a production rule is replaced by the non-terminal at the left 
side of that production rule. 


Note If the substring is chosen correctly, the right most derivation of that 
string is created in the reverse order. 


Handle 


Informally, a handle of a string is a substring that matches the right side of 
a production rule, but not every substring that matches the right side of a 
production rule is handle. A handle of a right sentential form y (= œ Bo) is a 
production rule A +B and a position of y, where the string B may be found 
and replaced by A to produce the previous right sentential form in a right 
most derivation of y. 


S > qaAo > abw, where wis a string of terminals. 
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If the grammar is unambiguous, then every right sentential form of the 
grammar has exactly one handle. A right most derivation in reverse can be 
obtained by Handle-Pruning. 


Stack Implementation 
There are four possible actions of a shift reduce parser 
= Shift The next input symbol is shifted onto the top of the stack. 
= Reduce Replace the handle on the top of the stack by the non-terminal. 
= Accept Successful completion of parsing. 
= Error Parser discovers a syntax error and calls an error recovery routine. 


= Initial stack just contains only the end-marker $. 
= The end of the input string is marked by the end-marker $. 


Conflicts During Shift Reduce Parsing 
There are CFGs for which shift reduce parser can’t be used. Stack contents 
and the next input symbol may not decide action. 


Shift/Reduce Conflict 
Whether make a shift operation or a reduction. 


Reduce/Reduce Conflict 
The parser can't decide which of several reductions to make. 


e lf a shift reduce parser can’t be used for a grammar, that grammar is 
called as non-LR (k) grammar. 


e An ambiguous grammar can never be a LR grammar. 


LR (k) 
Input scanned Right most k input symbols used 
from left to right derivation as a lookahsad 


symbol to determine 
parser action 


LR (k) grammar specifications 


Types of Shift Reduce Parsing 
There are two main categories of shift reduce parsers 


Operator Precedence Parser 

Simple, but supports only a small class of grammars. 
LR Parser 

Covers wide range of grammars 
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e SLR Simple LR parser 
e CLR Most general LR parser (canonical LR) 
e LALR Intermediate LR parser (look-ahead LR) 


SLR, CLR and LALR work in same way, but their parsing tables are 
different. 


Operator Precedence Parsing 

Operator grammar small but an important class of grammars. We may 
have an efficient operator precedence parser (a shift reduce parser) for an 
operator grammar. 

In an operator grammar, no production rule can have € at the right side and 
two adjacent non-terminals at the right side. 


Precedence Relations 
In operator precedence parsing, we define three disjoint precedence 
relations between certain pair of terminals. 
a <b, b has higher precedence than a. 
a=b, b has same precedence as a. 
a >b, b has lower precedence than a. 
The determination of correct precedence relation between terminals are 
based on the traditional notions of associativity and precedence of 
operators. 
e Scan the string from left end until the first > is encountered. 
e Then, scan the backwards (to the left) ove. 
e Using precedence relations to find handler any ‘= until a ‘<’ is 
encountered. 
e The handle contains everything to the first ‘>’ and to the right of the ‘<’ is 
encountered. 





The handles thus obtained can be used to shift reduce a given string. 


Handling Unary Minus 

e Operator precedence parsing can’t handle the unary minus, when we 
also use the binary minus in our grammar. 

e The best approach to solve this problem is to let the lexical analyzer 
handle this problem, as the lexical analyzer will return two different 
operators for the unary minus and the binary minus. 

e Lexical analyzer will need a look-ahead to distinguish the binary minus 
from the unary minus. 
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LR parsing 

LR parsing is most general non-back tracking shift reduce parsing. The 
class of grammars that can be parsed using LR methods is a proper 
superset of the class of grammars that can be parsed with predictive parsers. 


LL (1) grammars (c LR (1) grammars 


An LR parser can detect a syntactic error as soon as it is possible. 
Stack 


























Input 
[ahl al-al s] 
LR parsing 
algorithm Output 
Action table goto table 
terminal and $ non-terminals 
Four different Each item is 
actions a state 
number 








O-oO + DM 
O-7o-+ WM 





LR parser configuration 
A configuration of a LR parsing is 
(So XS, -Xmm ajai 44 ---Ay $) 
Stack Rest of input 


e S,, anda; decides the parser action by consulting the parsing action table 
(initial stack contains just S,). 


e A configuration of a LR parsing represents the right sentential form 
X; Xin aj aist An$ 
LR Parser Actions 
Shift S Shift the next input symbol and the state S onto the stack 
(So X45; Xm Sn FA 4 4-4 $ ) =a 
(So X45; Xin Sins BS} 4 4-+-An$) 
Reduce A -8 (or mm, where n is a production number) 
Pop 2|6| (=r) items from the stack; let us assume that B = Y4, Y5...., Y, 
Then, push A and S, where S = goto [Sm _,, A] 
(So XS; Xm Smaja) = 
(So X18; Xm 9m-r As, q;...a,$) 
Accept Parsing successfully completed. 
Error Parser detected an error (an empty entry in the action table). 


Syntax Directed Translation 


Grammar symbols are associated with attributes to associate information 
with the programming language constructs that they represent. Values of 
these attributes are evaluated by the semantic rules associated with the 
production rules. 
Evaluation of the semantic rules are as follows 
(i) May generate intermediate codes 
(ii) May put information into the symbol table 
(iii) May perform type checking 
(iv) May issue error messages 
(v) May perform some other activities 
e An attribute may hold a string, a number, a memory location, a complex 
record etc. 
e Evaluation of a semantic rule defines the value of an attribute, but 
asemantic rule may also have some side effects such as printing a value. 


e.g., 


M 
M 

















Production Semantic Rule Program Fragment 

L —E return | Print (E.val) Print (val [top — 1]) 

E>E+T E.val = E’val+ T.val | val [n top] = val [top — 2] 
+ val [top] 

ET E.val = T.val 

TOI’*F T.val=T’val * F.val | val [n top] = val [top — 2] 
*val [top] 

TF T.val = F.val 

F —(E) F.val = E.val val [n top] = val [top — 1] 

F — digit F.val = digit. lexval | val [top] = digit.lexval 











Syntax directed translation table 
e Symbols £, T and F are associated with an attribute value. 
e The token digit has an attribute lexval (it is assumed that it is evaluated by 
the lexical analyzer). 
e The program fragment above represents the implementation of the 
semantic rule for a bottom-up parser. 
e Ateach shift of digit, we also push digit.lexval into val_ stack. 
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e At all other shifts, we do not put anything into val_stack because other 
terminals donot have attributes (but we increment the stack pointer for 
val_stack). 

e The above model is suited for a desk calculator, where the purpose is to 
evaluate and to generate code. 


Intermediate Code Generation 


1 1 
l i 
| Intermediate codes are machine independent codes, but they are close to machine 
| instructions. The given program in a source language is converted to an equivalent ! 
program in an intermediate language by the intermediate code generator. 
| The designer of the compiler decides the intermediate language. 
! = Syntax trees can be used as an intermediate language. i 
' a Postfix notations, three address code (quadruples) can be used as an intermediate | 
' language. i 


Syntax Tree 


Syntax tree is a variant of the parse tree, where each leaf represents an 
operand and each interior node represent an operator. 


Production Semantic Rule 





E—E 1 op E2 E.val = NODE (op, E1.val, E2.val) 
E- (El) E.val = El.val 

E--El E.val = Unary (-, El.val) 

Eid E.val = (LEAF (id)) 





A sentence a* (b +d) would have the following syntax tree 
c > N 
b d 


Example of a syntax tree 








Production Semantic Rule Program Fragment 
E >El op E2 E.code=El.code | Print op 
||E2.code ||op 
E — (El) E.code = El.code 








E- id E.code = id Print id 
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Three-Address Code 
When each statement contains three addresses (two for operands and one 
for result), Most general kind of three-address code is 
xX=yopz 
Where x, y and z are names, constants or compiler generated 
temporaries and op is any operator. 


But we can also use the following notation for quadruples (much better 
notation because it looks like a machine code instruction) 


Op y, zZ, x 
Apply operator op to y and z and store the result in x. 


Representation of Three-Address Codes 

Three-address code can be represented in various forms i.e., quadruples, 
triples and indirect triples. These forms are demonstrated by way of 
examples below. 


























GROM A=- B* (C+D) 
Three address code is as follow 
Ti=-B 
T2=C+D 
T3=11* 72 
A=T3 
Quadruples 
Operator | Operand 1 | Operand 2 Result 
(1) = B Tl 
(2) + C D T2 
(3) Tl T2 T3 
(4) = A T3 
Triples 
Operator Operand 1 Operand 2 
(1) = B 
(2) + c D 


(3) 
(4) = 


(1) 


A 


(2) 
(3) 
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Indirect triple 

















Statement 

(0) (56) 

(1) (57) 

(2) (58) 

(3) (59) 

Operator Operand 1 Operand 2 

(56) = B 
(57) + C D 
(58) * (56) (57) 
(59) = A (58) 











Symbol Tables 


Symbol table is a data structure meant to collect information about names 

appearing in the source program. It keeps track about the scope/binding 

information about names. Each entry in the symbol table has a pair of the 

form (name and information). 

e Information consists of attributes (e.g., type, location) depending on the 
language. 

e Whenever a name is encountered, it is checked in the symbol table to see, 
if already occurs. If not, a new entry is created. 


In some cases, the symbol table record is created by the lexical analyzer 
as soon as the name is encountered in the input and the attribute of the 
name are entered when the declarations are processed. 


If same name can be used to denote different program elements in the 
same block, the symbol table record is created only when the name’s 
symantic role is discovered. 


Operations on Symbol Table 
; = Determine whether a given name is in the table. 

| = Add anew name to the table. 

= Access information associated to a given name. 

! = Add a new information for a given name. 

| = Delete a name (or a group of names) from the table. 
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Implementation of Symbol Table 

e Each entry in a symbol table can be implemented as a record that 
consists of several field. 

e The entries in symbol table records are not uniform and depend on the 
program element identified by the name. 

e Some information about the name may be kept outside of the symbol table 
record and/or some fields of the record may be left vacant for the reason of 
uniformity. A pointer to this information may be stored in the record. 

e The name may be stored in symbol table record it self or it can be stored in 
a separate array of characters and a pointer to it in the symbol table. 

e The information about runtime storage location, to be used at the time of 
code generation, is to kept in the symbol table. 

e Various approaches to symbol table organisation e.g., linear list, search 
tree and hash table. 


Linear List 

e tis the simplest approach in symbol table organisation. 

e The new names are added to the table in order they arrive. 
e A name is searched for its existence linearly. 


“Key Points = eee 


+ The average number of comparisons required in a linear list are proportional 
to 0.5 * (n + 1) where n = number of entries in the table. 


+ It takes less space but more access time. 


+ The time for adding/searching a name in search tree is proportional to (m+n) 
logn. 


+ The hash function maps the name into an integer value between 0 and k-1 
and uses it as an index in the hash table to search the list of the table records 
that are built on that hash index. 


Search Tree 

e Itis more efficient than linear lists. 

e We provide two links left and right, which point to record in the search tree. 

e A new name is added at a proper location in the tree such that it can be 
accessed alphabetically. 

e For any node A1 in the tree, all nodes accessible by the following left link 
precede node A1 alphabetically. 

e Similarly, for any node A1 in the tree, all names accessible by the following 
right link succeed A1 alphabetically. 
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Hash Table 

e Ahash table is a table of k-pointers from 0 to k—1 that point to the symbol 
table and record within the symbol table. 

e To search a value, we find out the hash value of the name by apply 
suitable hash function. 

e To add anon-existent name, we create a record for that name and insert it 
at the head of the list. 


Code Optimization 
It refers to obtain a more efficient code. While performing optimization, there are two 
points, which we need to focus on. 
= We must ensure that the transformed program is semantically equivalent to the 
original program. 
= The improvement of the program efficiency must be achieved without changing the 
algorithms which are used in the program. 


Optimization 





Y ł 
Machine Machine 
dependent independent 
(It exploits characteristics (This is based on mathematical 
of the target machine) properties of a sequence of 


source statements) 


Techniques used in Optimization 
The following are the techniques used in optimization as 


Common Sub-expression Elimination 
An expression need not be evaluated, if it was previously computed and 
values of variables in this expression have not changed, since the earlier 
computations. 
é.g.,a=b*c; 
d=b*c+x-y; 
We can eliminated the 2nd evaluation of b *c from this code if none of 
intervening statements has changed its value. So, we can rewrite the 
above code as 
Hl=b*¢; 
a=T1; 
d=71+x-y; 
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Compile Time Evaluation 

We can improve the execution efficiency of a program by shifting execution 
time actions to compile time. 

e.g., A=2*(22.0/7.0)*r 


Here, we can perform the computation 2 * (22.0 / 7.0) at compile time 
itself. This is known as folding. 


Using Constant Propagation 


Optimised 
yews ——> y=124/23 


In the above example y can directly be computed at compile time. 


If a variable is assigned a constant value and is used in an expression 
without being assigned other value to it, we can evaluate some portion 
of the expression using the constant value. 


Using Variable Propagation 
If a variable is assigned to another variable, we use one in place of another. 


This will be useful to carry out other optimisation that were otherwise not 
possible. 


e.g, 








c=a* D; 
x=ā; 


— > d=x*b 




















(Variable propagation technique) 


In the above example, if we replace x by a then, a* b and x* b will be 
identified as common sub-expressions. 


Using Dead Code Elimination 

If the value contained in a variable at that point is not used anywhere in the 
program subsequently, the variable is said to be dead at that place. 
Variable propagation often leads to making assignment statement into 
dead code. 












e.g. 
=a* b: , The assignment =a* b: 
c=a* b; Variable c=a* b; va ne b 
X=ã; — a Sy : 
: propagation ar is useless and | g=axb4 4 
i will lead to : can be í 
d=a*b+4) folowing |d=a*b+ 4 femioved 
changes 























(Dead code elimination technique) 
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Using Code Motion 
Evaluation of expression is moved from one part of the program to another 


in such a way that 


Compiler Design 


it is evaluated lesser frequently. 


We can bring the loop-invariant statement out of the loops. 


eg., 





a=200; 
while (a>0) 
{ 
b=x+y; 
if(a%b==0) 
print it (“%d ”, 
a); 


The statement 
b=x+ yis invariant so 


EAE 
we can bring it outside 


the loop 





a=200; 
b=x+y; 
while (a>0) 

{ 
if(a%b==0) 
print if (“%d ”, a); 
} 














} 


Code motion technique 








Using Induction Variable and Strength Reduction 

An induction variable may be defined as an integer scalar variable which is 
used in loop for the following kind of assignments i.e., i =i + constant. 
Strength reduction refers to the replacing the high strength operator by a 
low strength operator. Strength reduction used on induction variables to 
achieve a more efficient code. 








e.g., 
j=l; i=|; 
while (ai<10) t=4; 
{ —— while (t< 40) 
y=i*4; { 
} y=t 
t=t+4; 
} 




















Induction variable and strength reduction technique 


Use of Algebraic Identities 
Certain computations that look different to the compiler and are not 
identified as common sub-expressions are actually same. 


An expression BopC will usually be treated as being different to CopB. 
But for certain operations (like addition and multiplication), they will 
produce the same result. 


We can achieve further optimization by treating them as common 
sub-expressions for such operations. 


Handbook Computer Science & IT 295 


Run-time Administration 


It refers how do we allocate the space for the generated target code and 
the data object of our source programs? The places of the data objects 
that can be determined to compile time will be allocated statically. But the 
places for the some of data objects will be allocated at run-time. 


The allocation and de allocation of the data objects is managed by the 
run-time support package. Run-time support package is loaded together 
with the generated target code. The structure of the run-time support 
package depends on the semantics of the programming language 
(especially the semantics of procedures in that language). 


Procedure Activation 


Each activation of a procedure is called as activation of that procedure. An 

execution of a procedure starts at the beginning of the procedure body. 

When the procedure is completed, it returns the control to the point 

immediately after the place, where that procedure is called. Each execution 

of the procedure is called as its activation. 

e Lifetime of an activation of that procedure (including the other procedures 
called by that procedure). 

e |f a and b are procedure activations, then their lifetimes are either 
non-overlapping or are nested. 

e lf a procedure is recursive, a new activation can begin before an earlier 
activation of the same procedure has ended. 


Activation Tree 

We can create a tree (known as activation tree) to show the way control 

enters and leaves activations. In an activation tree 

e Each node represents an activation of a procedure. 

e The root represents the activation of the main program. 

e The node a is a parent of the node b if and only if the control flows from a 
to b. 

e The node ais left to the node b if the lifetime of a occurs before the lifetime 
of b. 
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e.g., 
Program main; enter main main 
Procedure s; enter p a \ 
begin....end; enter q ff % 2 
Procedure p; exit q q s 
Procedure q; enter s Activation tree 
begin .... end; exit s 
begin q; s; end; exit p 
begin p; s; end; enter s 

exit s 
exit main 





Control Stack 


The flow of the control in a program corresponds to a depth first traversal of 
the activation tree that 
(i) Starts at the root. 
(ii) Visits a node before its children. 
(iii) Recursively visits children at each node and a left-to-right order. 
e A stack called control stack can be used to keep track of live procedure 
activations. 
(i) An activation record is pushed onto the control stack as the 
activation starts. 
(ii) That activation record is popped when that activation ends. 


e When node n is at the top of the control stack, the stack contains the 
nodes along the path from n to the root. 


Variable Scope 

The scope rules of the language determine, which declaration of a name 
applies when the name appears in the program. 

An occurrence of a variable is local, if that occurrence is in the same 
procedure in which that name is declared and the variable is non-local, if it 
is declared outside of that procedure. 
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e.g., 
procedure q; Variable b is local 
var a: real; to procedure r 
procedure r; = and variable ais 
var b: integer; non-local to 
begin b = 4;a =2; end; procedure r. 
begin ... end; 











Storage Organisation 






Locations of static data 
——» can also be determined 
at compile time 





Memory locations 
for code that are 
determined at 
compile time 


Static 
data 





Other dynamically allocated 
Heap —— data object at run-time (for 
example malloc area in C) 








Data objects allocated 
at run-time 


Activation Record 

Information needed by a single execution of a procedure is managed using 
a contiguous block of storage called activation record. When a procedure 
is entered, an activation record is allocated and it is deallocated when that 
procedure exits. Size of each field can be determined at compile time, 
although actual location of the activation record is determined at run-time. 


“Key Points ee 


+ If a procedure has a local variable and its size depends on a parameter, its 
size is determined at run-time. 


+ Some part of the activation record of a procedure is created by that procedure 
immediately after that procedure is entered and some part is created by the 
caller of that procedure before that procedure is entered. 


298 Compiler Design 


The returned value 

of the called procedure 

is returned in this field 

to the calling procedure. 
We can use a machine 
register for the return value. 


Return value 


Actual The field for actual parameters 
parameters | ¿is used by the calling procedure 
to supply parameter to the 
Optional called procedure. 


control link 


The optional control link 
points to the activation 
record of the caller. 





i The optional access link is 
tional access 
Opt a ess}, used to refer to non-local data 


The field for saved held in other activation record. 


machine status holds 
information about the 
state of the machine 

before the procedure 


Saved machine 
status 


Local data field holds 








is called. Local data |» data that is local to an 
execution of a procedure. 
Temporaries |> Temporary variables are 
stored in field of temporaries. 
Activation record table 
Displays 


The array of pointer which is used to access activation records is known as 
displays. 

For each level, there will be an array entry as given below 

e Current activation record at level 1. 

e Current activation record at level 2. 

e Current activation record at level 3. 


Error Detection and Recovery 

e The parser should be able to give an error free message which should be 
meaningful. 

e The parser should also be capable of recovering from the errors and it 
should be able enough to continue the parsing with the rest of the input. 
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Error Recovery Techniques 

Panic Mode Error Recovery 

Skipping the input symbols until a synchronising token is found. 
Phrase Level Error Recovery 


Each empty entry in the parsing table is filled with a pointer to a specific 
error routine to take care that error case. 


Error Productions 


If we have a good idea of the common errors that might be encountered, 
we can augment the grammar with productions that generate erroneous 
constructs. When an error production is used by the parser, we can 
generate appropriate error diagnostics. Since, it is almost impossible to 
know all the errors that can be made by the programmers, this method is 
not practical. 


Global Correction 

Ideally, we would like a compiler to make as few changes as possible in 
processing incorrect inputs. We have to globally analyze the input to find 
the error. This is an expensive method and it is not in practice. 


Software Engineering 
and Information System 


Software Engineering 


Software engineering is defined as a discipline whose aim is the production 
of quality software, software that is delivered on time within budget and that 
satisfies its requirements. 


Software 


A software consists of programs, 
documentation of any fact of the 
program and the procedures used 
to setup and operate the software 
system. Basically, program is a 
combination of source code and 
objects code. Operating procedures 
consists of instructions to setup 
and use the software system and 
instructions so that they can react to the system failure. 


Software = Programs + Documentation + Operating procedures 










; Operating 





Software model 


Product and Process 
= What is delivered to the customer is called a product. It may include source code, 
specification document manuals, documentation, etc. 


= Process is the way in which we produce software. It is the collection of activities 
that leads to a part of a product. 
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Measures, Metrices and Measurement 

e A measure provides a quantitative indication of the extent,dimension, 
size, capacity, efficiency, productivity or reliability of some attributes of a 
product or process. 

e Measurement is the act of evaluating a measure. 

e A metric is a quantitative measure of the degre to which a system 
component or process possesses a given attribute. 


Software Development Life Cycle (SDLC) 


A software life cycle is often called as a software development life cycle 
and it is a particular abstraction that represents a software life cycle. The 
period of time that starts when a software product is conceived and ends 
when the product is no longer available for use. 

The software development life cycle typically includes following phases 

1. Requirement phase 2. Design phase 

3. Implementation phase 4. Test phase 

5. Installation and check out phase 

6. Operation and maintenance phase 


Build and Fix Model 


Sometimes a product is constructed without 
specification. Basically, this is an adhoc C 
approach and not well defined. It is a simple two 
phase model. The first phase is to write code 
and the next phase is to fix it. Fixing in this 
context may be error correction or addition of further functionality. 


Two phase model 


Waterfall Model 


The waterfall model is a sequential software development model in which 
development is seen as following steadily downwards like a waterfall 
through several phases. This model maintains that one should move to 
next phase only when its preceding phase is complete and perfect. 
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Phases of development in the waterfall model are thus discrete and there is 
no jumping back and forth or overlapping between them. 


Requirement 
specification 





System 


design 
Implementation 
and unit testing 
Integration and 
system testing 


Maintenance 











Waterfall model 


Problems of Waterfall Model 


It is difficult to define all requirements at the beginning of a project. 

This model is not suitable for accomodating any change. 

A working version of the system is not seen until late in the project's life, 
thus delaying the discovery of serious errors. 


Prototype Model 


Here, we first develop a working prototype of the software instead of 
developing the actual software. 


Advantages of Prototyping Model 


Users are actively involved in the development. 

It provides better system to users, as users have natural tendency to 
change their mind in specifying requirements. 

Since, in this methodology, a working model of the system is provided to 
the users so that they can get a better understanding of the system being 
developed. 

Errors can be detected much earlier as the system is made side by side. 
Quick user feedback is available leading to better solution. 
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Requirement 





design 














Refinement of 
requirements as 
per suggestions 





Implement 


Customer 
evaluation | Not accepted by customer 
Accepted by customer 


System 
design 


J 


Implementation 
and unit testing 


Integration and 
system testing 








i 


Maintenance 


Prototype model 


Disadvantages of Prototyping Model 

e Practically, this methodology may increase the complexity of the system 
as scope of the system may expand beyond original plans. 

e This model leads to implementing and then repairing way of building 
system. 


Iterative Enhancement Life Cycle Model 


This model counters the limitation of the waterfall model and combines the 
benefits of both prototyping and the waterfall models. The basic idea is that 
the software should be developed in increments, where each increment 
adds some functional capability to the system until the full system is 
implemented. At each step extensions and design modifications can be 
made. 
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An advantage of this approach is that it can result in better testing, since 
testing each increment is likely to be easier than testing entire system like 
in the waterfall model. 
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Iterative enhancement model 


Spiral Model 


This is the recent model that has been proposed by Barry Boehm. The 
spiral model has many cycles. The radial dimension represents the 
cumulative cost incurred in accomplishing the steps done so far and the 
angular dimension represents the progress made in completing each cycle 
of spiral. The spiral model is divided into a number of framework activities, 
also called task regions. 


Typically, there are between three to six task regions as shown in figure. 


Planning 












TEE Risk analysis 
Customer communication 





Customer evaluation 
Engineering 


Constructions and release 





Spiral model 


e Customer Communication Task required to establish effective 
communication between developer and customer. 
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e Planning Task required to define resources, timeliness and other project 
related information. 

e Risk Analysis Task required to access both technical and management 
risks. 

e Engineering Task required to build one or more representations of the 
application. 


e Construction and Release Task required to construct, test, install and 
provide user support (e. g., documentation and training). 


e Customer Evaluation Task required to obtain customer feedback 
based on evaluation of the software representations created during the 
engineering stage and implemented during the installation stage. 


Software Requirements 


Software requirement is a process to understand the exact requirements of 
the customer and to document them properly. The hardest part of building 
a software system is deciding precisely what to build. 


Analysis and Specifications 
e Requirements describe the ‘what’ of a system not the ‘how’. 


e Requirements engineering produces one large document, contains a 
description of what the system will do. 


Requirement 
elicitation 






Problem 
statement 














Requirement 


analysis 
Requirement 
documentation 
Requirement 
review 


Various steps of analysis and specifications 


Requirement 
engineering 
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Requirement Elicitation 

This is also known as gathering of requirements. Here, requirements are 

identified with the help of customer and existing system processes, if 

available. 

There are following methods that can be used in requirement elicitation 

e Interviews First step to understand the problem statement of customer 
i.e., meeting with customer. 

e Brainstorming Sessions Itis a kind of group discussion which may lead 
to new ideas quickly and help to promote creative thinking. 

e Facilitated Application Specification Technique (FAST) The objective 
of FAST approach is to bridge the expectation gap, a difference between 
what developers think they are supposed to build and what customers 
think they are going to get. 

e The Use Case Approach This approach uses a combination of text and 
pictures in order to improve the understanding of requirements. 


Use case diagrams are graphical representations that may be 
decomposed into further levels of abstraction. 


Design of the Use Case Approach 


The following components are used for the design of the use case approach 


+O a 


(Actor) (Use case) (Relationship between 
actor and use case 
and/or between the 

use cases) 


Actor or external agent lies outsides the system model but interacts with it in some 
way. An actor may be a person, machine or an information system. 


Use case is initiated by a user with a particular goal in mind and competes 
successfully when that goal is satisfied. It describes the sequence of 
interactions between actors and the system necessary to deliver services 
that satisfies the goal. 


Requirement Analysis 

Requirement analysis allows the system analyst to refine the software 
allocation and build conceptual models of the data, functional and 
behavioural domains that will be treated by software. 
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Data Modeling 

e Define data objects attributes and relationships. 

e We use E-R diagrams for this purpose. 

Bhavioural Modeling 

e Finding out different states of the system. 

e Specifying events that cause the system to change state. 

e We use state transition diagrams for behavioural modeling. 
Function Modeling 

e Identify functions that transform data objects . 


e Indicate how data flows through system. 
e Represent producers and consumers of data. 


Symbols used in a DFD 


Symbol Name Function 








Data flow Used to connect processes to each other to 
sources or sinks, the arrowhead indicates 
direction of data flow. 


Process Performs some transformations of input 
data to yield output data. 


(external entity) | outputs. 


Data store A repository of data, the arrowheads indicate 
net inputs and the net outputs to store. 


| Source or sink | A source of system inputs or sink of system 








Requirement Documentation 

Requirement document is the way to represent requirements in a 
consistent format. Requirement document is called SRS /.e., Software 
Requirements Specification. The SRS should be correct, unambiguous, 
complete, consistent, verifiable, modifiable, traceable. 


-Key Points ee 


+ For function modeling, we use Data Flow Diagrams (DFDs). DFD shows the 
flow of data through a system. 

+ The requirement review process is carried out to improve the quality of the SRS. 

+ The requirement review process may also be called as requirements verification. 

+ For maximum benefits, review and verification should not be treated as a 
discrete activity to be done only at the end of preparation of SRS. It should be 
treated as continuous activity that is incorporated into the elicitation, analysis 
and documentation. 
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Requirement Validation 

After the completions of SRS document, we may like to check the 
documents for 

e Completeness and consistency 

e Conformance to standards 

e Requirements conflicts 

e Technical errors 

e Ambiguous requirements 


Validation Process with Inputs and Outputs 
The objective of requirements validation is to certify that the SRS is an acceptable 
document of the system to be implemented. 


SRS 
i Document 










List of 
Ni 
Organisational alat 
standards p 
Organisational a Approved 
g actions 


problems 


knowledge 


Software Risk Management 


Risk is a problem that could cause some loss or threaten the success of 
the project but which has not happened yet. Risk management means 
dealing with a concern before it becomes a crisis. 


Typical Software Risks 
These can be classified as 
Requirement Issue 


Many project face uncertainity around the products’s requirements. If we 
do not control requirements related risk factors, we might either build the 
wrong product or build the right product badly. 


Management Issue 


Project manager usually write the risk management plan and most people 
do not wish to air the weaknesses in public. If we do not confront such 
touch issues, we should not be surprised, if they bite us at some points. 
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Lack of Knowledge 


The rapid rate of change in technologies and the increasing change of 
skilled staff, means that our project teams may not have the skills we need 
to be successful. 


Risk Management Activities 


Risk management activities consists of two key areas that are risk 
assessment and risk control. 


Risk identification 











Risk Assessment Risk analysis 





Risk Risk prioritisation 





Management 





Risk management 
planning 


Risk Control Risk monitoring 


Risk resolution 




















Risk management activities 


Risk Assessment 


It is the process of examining a project and identifying areas of potential 
risk. Risk identification can be facilitated with the help of a check list of 
common risk areas of software projects. Risk analysis involves examining 
how project outcomes might change with modification of risk input 
variables. Risk prioritisation helps the project focus on its most severe risks 
of assessing the risk exposure. 


Risk Control 


It is the process of managing risks to achieve the desired outcomes. Risk 
management planning produces a plan for dealing with each significant 
risk. We should also monitor the project as development progress by 
periodically re-evaluating the risks and their probabilities. Risk resolution is 
the execution of the plans for dealing with each risk. 


Risk Estimated Method 

Priority of each risk can be computed as | 
P=r*s i 

where, r= The likelihood of a risk coming true i 
s = The consequences of the problems associated with that risk. 


Software Design 


Software design translates requirements into a blueprint for constructing 
the software. The purpose of design phase is to produce a solution to a 
problem given in Software Requirement Specification (SRS) document. 
Here, we generate a Software Design Document (SDD). 


Design Concepts 
It basically describes the three things given below 
(i) Separation of function/data structure detail from a conceptual 
representation of the software (abstraction and refinement). 
(ii) Deciding criteria to partition software into individual components 
(i.e., modularity). 
(iii) Criteria to define the technical quality of a sofware design. 


Abstraction 

Abstraction can be defined as the process of looking what is essential to 
your perspective or understanding without looking at the complex or low 
level details. 


Types of Abstraction 
Two types of abstractions are available in modern programming language. 
= Data Abstraction i.e., a named collection of data that describes a data object. 


= Procedural Abstraction j.e., a named sequence of instructions with a specific and 
limited function. 


Consider a sentence of two words i.e., open door. In this sentence, open 
could be an example of procedural abstraction and door could be an 
example of data abstraction. 

Open implies a long sequence of procedural steps such as 

e Walk to the door e Reach out and grasp knob 

e Turn knob and pull door e Step away from moving door 

The data abstraction for door would encompass as set of attributes that 
describes the door such as 

e Door type e Manufacturer 

e Model number e Swing direction 

e Weight and dimensions 


Handbook Computer Science & IT ola 


Refinement/Stepwise Refinement 

A hierarchy is developed by decomposing a macroscopic statement of 
function (a procedural abstraction) in a stepwise fashion until programming 
language statements are reached. 


Top-down Decomposition In each step of the refinement, one or several 
instructions of the given program are decomposed into more detailed 
instructions. This Successive decomposition or refinement of specifications 
terminates when all instructions are expressed in terms of underlying 
computer or programming language. 





Reach for knob; 
Open door; 
Walk through; Repeat until door opens 
Close door turn knob clockwise, 

if knob does not turn, 
then take key out; 

Find correct key; 

Insert in lock; 

end if 

pull/push door; 

move out of way 

end repeat 

















Refinement- A process of elaboration 


Modularity 

Modularity is the process, wherein software is divided into separately 
named and addressable components, often called modules, that are 
integrated to satisfy problem requirements. It uses the divide and conquere 
approach for solving a complex problem by breaking it or modularising it 
into smaller modules. 


Control Hierarchy 


It represents the organisation of program components (modules) and 
implies a hierarchy of control. does not It represent procedural aspects 
of software such as sequence of processes, occurence or order of 
decisions or repetitions, nor is it necessarily applicable to all architectural 
styles. 
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Horizontal Partitioning 

A module that controls other module is called as manager module, while a 
module controlled by another is known as subordinate of the manager 
module. Horizontal partitioning defines separate branches of the hierarchy 
for each major program function. The simplest approach to horizontal 
partitioning defines three partitions; input, data transformations or 
processing and output. 


Vertical Partitioning 


It is also called factoring. It suggests that control (decision-making) and 
work should be distributed top-down in the program structure. 


-Key Points 3 ee ee 
+ Top-level modules should perform control functions and do little processing 
work. 


+ Modules that reside lower in the structure should be the workers, performing 
all input, computations and output tasks. 


Functional Independence 


Functional independence is achieved by developing modules with 
single-minded functionality (functionally cohesive) and an aversion to 
excessive interactive with other modules (weak coupling). Functional 
independence is a key to good design and good design is key to software 
quality. 

Functional independence is measured using two qualitative criteria 


(i) Cohesion (ii) Coupling 


Cohesion 


Cohesion is a measure of the degree to which the elements of a module 
are functionally related. 


-Key POINTS iiini 
+ A strongly cohesive module implements functionality that is related to one 
feature of the solution and requires little or no interaction with other modules. 

+ Cohesion is equal to strength of relations within modules. 


Here, an important design objective is to maximize the module cohesion 
and minimize the module coupling. 
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Strength of Different Types of Cohesion 


Types of Cohesion 


Best (High) 





Functional cohesion 





Sequential cohesion 





Communicational cohesion 





Procedural cohesion 





Temporal cohesion 





Logical cohesion 





Coincidental cohesion 


A 








Coupling 





Worst (Low) 


Coupling is the measure of the degree of interdependence or 


interconnection between modules. 


-Key Points 


+ Two modules with high coupling are strongly interconnected and thus 


dependent on each other. 


+ Two modules with low coupling are less or not dependent on one another. 


ew Oo 
i: 


(a) Uncoupled 


O-O  =—900 
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(b) Loosely coupled 


(c) Highly coupled 


Strength of Different Types of Coupling 





Types of Coupling 


Best (High/Strong) 





Data coupling 





Stamp coupling 





Control coupling 





External coupling 





Common coupling 





Content coupling 








Worst (Low/Loose) 


Testing 


Testing refers to a defect detection mechanism and its purpose is to find 
errors. Testing is a process of executing a program with intent of finding an 
error. 


Verification 


It is a process of determining whether or not the product of a given phase 
of software development fulfill the requirements established during the 
previous data. Verification is all about; are we building the product right. 
Verification means software product should meet user expectations, 
checking that users expectations are satisfied. We can do this with the help 
of below 

e Functional testing 

e Intergration and interface testing 

e System testing 

e Acceptance criteria 

e Regression testing 


-Key POINTS iiini 
+ A good test case is one that has a high probability of finding a yet 
undiscovered error. 
+ A successful test is one that uncovers a yet undiscovered error. 


Validation 


Validation is a process of evaluating a system or component during or at 
the end of development process to determine whether it satisfies the 
specified requirements. Validation is all about; are we building the right 
product. Validation ensures whether the software product is behaving 
according to its specification. We can do this with the help of below 

e Technical reviews and inspections 

e Buddy checks, peer reviews 

e Root cause analysis 

e Metric definition 

e Certification demonstrations 


Handbook Computer Science & IT Ales 






Customer 


a= 
Analysis 


= 
. Coding 
1 


Acceptance 
testing 

















Validation 














Testing Techniques 


Testing is the process of execution of a program with the intent of finding 
errors. 


There are two types of testing techniques which are given below 


White Box Testing (Structural Testing) 

Testing based on the internal specification with 

knowledge of how system is constructed. In this 

testing approach, we have to analyse the code 

and use the knowledge about the program 

structure to derive test data. 

White box testing techniques are as given below 

Basic path testing Structure/white box 
e Flow graph notation testing 
e Cyclomatic complexity 

e Graph matrices 

Control structure testing 

e Loop testing 


Black Box Testing (Functional Testing) 

Testing based on the external specification without the knowledge of how 
system is constructed. In this approach, testers need not to have explicit 
knowledge of internal workings of the item being tested. 
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Domain Range 
Input test “System [|= Output test 
Input which Output which 
cause anomalous reveal presence 
behaviour of defects 


Black box/functional testing 
Black box testing techniques are given below 
(i) Equivalence partition (ii) Boundary value analysis 
(iii) Robustness testing 
Here, some instances are given below, where white box testing is better 
than black box testing 
(i) Logical error (il) Memory overflow undetected 
(iii) Topological error 
Some instances, where black box testing is better than white box testing. 
(i) Functional requirements not met 
(ii) Integration errors 
(iii) Incorrect parameters passed between functions 


Types of Testing / Level of Testing 
There are mainly three levels of testing. A software product goes through 
these levels of testing. 

1. Unit testing 2. Integration testing 3. System testing 


Unit Testing 

Unit testing is the process of taking a module (the smallest unit of software 

design) and running it in isolation from the rest of the software product by 

using prepared test cases and comparing the actual results with the results 

predicted by the specification and design of the module. 

There are number of reasons in support of unit testing than testing the entire 

product. 

e The size of a single module is small enough that we can locate an error 
fairly easily. 

e Confusing interactions of multiple errors in widely different parts of the 
software are eliminated. 


Unit Testing is white box oriented. 
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Integration Testing 


This integration testing is used to test the integration and consistency of an 
integrated subsystem. Integration testing is applied incremently as 
modules are assembled into larger subsystems. It is done using a 
combination of both black box and white box testing techniques. 


Testing Applied to Integrated Part of the System 


Testing is applied to subsystems, which are assembled in either 


level modules test stubs. 


= Bottom-up Assembles from the lowest level modules replacing the higher level 
by test drives. 


| = Top-down Assembles down from the highest level modules replacing the lower 


A stub is a simplified program or dummy module designed to provide the 
response that would be provided by the real sub-element. 
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Bottom-up integration testing 


System Testing 

If focuses on complete integrated system to evaluate compliance with 
specified requirements. It is basically used for performance, stress and 
security testing. System testing includes the following testing techniques 


Acceptance Testing 
This testing is performed before to handover the system to the customer. 


Here, the customer may write the test criteria and request the developer to 
execute them or the develper can write the criteria and take the customer’s 
approval. 
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Acceptance testing focuses on complete integrated system to evaluate 
fitness of use. It is done from the users perspective. 


Alpha and Beta Testing 

Alpha testing is done at developer’s site by customer. In alpha testing, 
developers are present and environment is controlled environments. Beta 
testing is done at one or more customer’s site by end users/customers. In 
beta testing, we face live situation and here developer may or may not be 
present. Beta testing usually comes in picture when the number of users 
are millions. 


Performance Testing 

This testing is concerned with assessing the time and memory aspects of 
the system. Performance testing may be concerned with checking that the 
operation completes whithin the fixed deadline and only a fixed size of 
memory is allocated. 


Regression Testing 

Regression testing is applied after changes have been made to the 
system. The operation of the new version is compared with the previous 
version to see, if there are any unexpected results. 


Software Measurement 


In software measurement, software is measured to find its efficiency and 
accuracy. The functionality of a software is totally dependent on the 
measurement process, if a software gives correct output in the software 
measurement process, then it can be sure that software will give 
meaningful and valuable information within a defined time. 
Measurement in the physical world can be categorised into two ways. 
Direct measures of software engineering process 
e \|t includes cost and effort applied e It also includes 

(i) Lines of Code (LOC) produced 

(ii) Execution speed (iii) Memory size 

(iv) Defects reported over some period of time 
Indirect measures of the product 


e It includes 
(i) Functionality (ii) Quality 
(iii) Complexity (iv) Efficiency 


(v) Reliability and maintainability etc. 
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Software Metrics (Direct Measure) 


Software metrics can be defined as the continuous application of 
measurement based techniques to the software development process and 
its products to supply meaningful and timely management information 
together with the use of those techniques to improve that process and its 
products. 


Categories of Software Metrices 
There are three main categories of software metrics which are given below 


Product metrics 
These metrics describe the characteristics of the product such as size, 
complexity, design features, performance, efficiency, reliability etc. 


Project metrics 


These metrics describe the project characteristics and execution. Example 
of project metrices are 


e Number of software developers. 

e Staffing pattern over the life cycle of the software. 
e Cost and schedule. 

e Productivity. 


Process metrics 
These metrics describe the effectiveness and quality of the process that 
produce the software product. Examples of process metrices are 


e Effort required in the process. * Time to produce the product. 
e Maturity of the process. e Number of defects found during testing. 
e Effectiveness of defect removal during developments. 


Some Common Software Metrics 
LOC (Lines of Code) It specifies size of a software. LOC can also be used 
to given a more precise characterisation to the common motions of small, 
large or very large projects e.g., 

Size Small Medium Large Very large 

LOC <2k 2k-8k 8k-832k >32k 


Person months (Man-months) It specifies the effort needed or spent on a 
project. 
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Normalized Metrics 


If measures are normalized, it is possible to create software metrics that 
enable comparison to broader organisational averages. 


Size Oriented Metrics 
These metrics are derived by normalizing quality and/or productive 
measures by considering the size of software that has been produced. 


Productivity = LOC/Person month ; Quality = Defects/LOC ; Cost = Price/LOC 


Function Oriented Metrics 


These metrics use a measure of the functionality delivered by the 
application as a normalization value. Functionality is measured indirectly 
using other direct measures. 


Function Point (FP) Metrics 

This metric can easily be used to estimate the size of a software product 
directly form the problem specification. The conceptual idea underlying the 
function point metric is that the size of a software product is directly 
dependent on the number of different functions and features it supports. 
Function point is computed in two steps 


(i) The first step is to compute the Unadjusted Function Point (UFP). 


UFP = (Number of inputs) * 4 + (Number of output) * 5+ (Number of 
inquiries) * 4 + (Number of files) * 10 + (Number of interfaces) * 10 

Once the UFP is computed the Technical Complexity Factor (TCF) is 
computed next. The TFP refines the UFP measure by considering 
14 other factors such as high transaction rates, throughput and 
response time requiremnts etc. Each of these 14 factors time 
assigned a value from 0 (not present or no influence) to 6 (strong influence). 
The resulting numbers are summed, yielding the total degree of influence. 


TCF = (0.65 + 001 *D1) as D1can vary from 0 to 70, the TCF can vary 
from 0.65 to 1.35. Finally, FP = UFP * TCF 


i 1 
Key Words Related to Normalized Matrices 
= Number of inputs Each data item by user is counted. 
| = Number of outputs A set of related data items is counted as one output. The output ! 
' considered refer to reports printed, screen outputs, error messages produced etc. l 
; = Number of inquiries Number of distinct interactive queries which can be made by the | 
| users. These inquiries are user commands which require specific action by the system. | 
1 l 
[i l 
1 1 
l 1 


= Number of files Each logical file is counted. A logical file means group of logically 
related data. Thus, logical files can be data structures or physical files. 
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Indirect Measures 
Indirect measures focus on software reliability and software quality 


Software Reliability 


Oa 


It is defined as the ability of a system or component to perform its required 


function under stated conditions for a speci 


f 


ied period of time. Before the 


deployment of software products, testing verification and validation are 
necessary steps which can ensure better software reliability in the product. 


Software Quality 


One objective of software engineering 


is to produce good quality 


maintainable software in time and within budget. If a product is meeting its 
requirements, we may say, it is a good quality product. 





-Key Points see 


+ Software reliability is the probability of failure free software operation for a 
specified period of time in a specified environment. 

+ When we deal with software quality, a list of attributes (reliability, usability, 
maintainability and adaptability) is required to be defined that are appropriate 


for software. 


Reliability 


e Correctness 
e Robustness 
e Simplicity 

e Traceability 

e Consistency 





Maintainability 


e Modularity 

e Readability 

e Clarity of 
documentation 








Software 
quality 


Usability 





e Accuracy 

e Completeness 

e Efficiency 

e Testing 

e Clarity and accuracy 
of documentation 








Adaptability 


e Modifiability 


e Expandability 
e Portability 





Software quality model 
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Project Estimation Techniques 


The estimation of various project parameters is a basic project planning 
activity. The important project parameter that are estimated include project 
size, effort required to develop the software; project duration and cost. 


There are three broad categories of estimation techniuques 


Empirical Estimation Techniques 


These techniques are based on making an educated guess of the project 
parameters. 


Heuristic Techniques 
This technique assumes that the relationships among the different project 
parameters can be modelled using suitable mathematical expression. 


Analytical Estimation Techniques 

This technique derive the required result starting with certain basic 
assumptions regarding the project. Thus, unlike emperical and heuristic 
approach, this approach do have a scientific basis. Example of analytical 
technique is Halstead’s software science. 


Halstead Software Science 
Halstead used a set of primitive measures, which can be derived once the 
design phase is completed and the code is generated. 


There measures are as follows 
n, = Number of distinct operators in a program 
N = Number of distinct operands in a program 
N, = Total number of operators 
N, = Total number of operands 
Program length (N) can be calculated by using below equation 
N =n; logs N4 + N logs No 
Program volume (V) can be calculated by using equation given below, 
volume of information’s unit is in bits 
V =N loga (m + m) 
Volume ration (L) must be less than 1 and can be calculated as 


pas 
n No 
Program difficulty level (D) = (| 7 [| 
2 No 


Effort (E)=D*V 


COCOMO Model 


It is a heuristic estimation technique. This is also known as constructive 
cost model. Software development project can be classified into one of the 
three categories based on the development complexity /.e., organic, 
semidetached and embedded. 


Organic 

We can consider a development project to be of organic type, if the project 
deals with developing a well understood application program. The size of 
the development team is reasonably small and the team members are 
experienced in developing similar types of project. 


Semidetached 


If the development team consists of a mixture of experienced and 
unexperienced staff. Team members may have limited experience on 
related system but may be unfamiliar with some aspects of the system 
being developed. 


Embedded 

We apply this approach, if the software being developed is strongly 
coupled to complex hardware or if stringent regulations on the operational 
procedure exist. 

According to Boehm, software cost estimation should be done through three 
stages /.e., basic COCOMO, intermediate COCOMO and complete COCOMO. 


Basic COCOMO Model 


Barry Boehm introduced a hierarchy of software estimation models named 
Constructive Cost Model. Basic equations of COCOMO model are 


Efforts in person-months (E) = a (KLOC)? 
Development time in months (D) = c (E 
Where a, b,c and d are coefficients that have fixed values for different 
classes of projects. 


Table for Coefficients in COCOMO Model 





Project a b c d 
Organic 3.2 1.05 2.5 0.38 
Semidetached 3.0 12; 2.5 0.35 


Embedded 2.8 1.20 2.5 0.32 





Cyclomatic Complexity 


It is a software metric that provides a quantitative measure of the logical 
complexity of a program. It defines the independent paths in a program 
which provides us with an upper bound for the number of tests that must 
be conducted to ensure that all statements are executed atleast once. 
Cyclomatic complexity V (G), for a flow graph G is defined as 
ViG)=E-N+2 
where, N = Number of nodes 
E = Number of edges 

Cyclomatic complexity V (G), for a flow graph G is defined as V(G) =P +1 
where, P = Predicate nodes 
Let’s take an example graph as shown in figure 


Path 1 








721-9 Edge 
Path 2 R> 
>1-3-8-1-9 Node 
Ry 

Path 3 < R3 
>1-2-4-7-8-1-11 CN 
Path 4 

R4 


»1-2-5-7-8-1-11 


Here, we can see that each new path 
introduces a new edge. Graph for cyclomatic 


en 
The flow graph has four regions (R4, Ro, Ra, R4) complexity 
V(G) = 10 edges - 8 nodes + 2 = 4 





-Key Ponts 9 


Independent path is any path through the program that introduces atleast one 
new condition or new set of processing statements. 

+ The number of regions of the flow graph correspond to cyclomatic 
complexity. 


+ 
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Evolution of Quality System 

e Product Inspection This method gave a way to quality control. 

e Quality Control It aims for correcting the causes of errors and not just 
rejecting the defective products. 

e Quality Assurance If an organisation’s process are good and are 
followed rigorously, the product are bound to be of good quality. 

e Total Quality Management (TQM) The process followed by an 
organisation must be continuously improved through process 
measurements. 


Inspection 
L 
Quality control 
L 
Quality assurance 
L 
Total Quality Management (TQM) 


Quality assurance method 


ISO 9000 Certification 


It specifies a set of guidelines for repeatable and high quality product development. 

ISO 9000 standard mainly addresses operational and organisational aspects such as 

responsibilities, reporting etc. 

It is a series of three standards ISO 9001, ISO 9002 and ISO 9003. 

= ISO 9001 standard applies to the organisations engaged in design, development, 
production and servicing of goods. 


= ISO 9002 standard applies to those organisations which do not design products but 
are only involved in production. 


= ISO 9003 standard applies to oganisations involved only in installation and testing 
of the products. 


Software Engineering Institute 
Capability Maturity Model (SEI-CMM) 


It is a strategy for improving the software process, irrespective of the actual 
life cylce model used. CMM used to judge the maturity of the software 
processes of an organisation and to identify the key practices that are 
required to increase the maturity of the process. 
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CMM is organized into five maturity levels as shown below 


Optimizing | 5 








Managed] 4 
Defined | 3 
Repeatable} 2 


Initial 
Levels of SEI-CMM 





Description of Levels of CMM (Maturity Levels) 

















CMM Level Focus Key Process Area 
Initial Complete people = 
Repeatable | Project management | Software project planning, software configuration 
management. 
Defined Definition of Process definition, training program, pre reviews. 
processes 
Managed | Product and process | Qualitative process metrics, software quality 
quality managements. 
Optimizing | Continuous process | Defect prevention, process change management, 
improvement technology change management. 
Six Sigma (0) 


It is a disciplined, data driven approach to eleminate defects in any 
process from manufacturing to transactional and from product to service. A 
six sigma defect is defined as many system behaviour that is not as per 
customer specifications. 


-Key POINTS iirinn 


+ Total number of six sigma opportunities equals to total number of chances for 
a defect 

+ Six sigma can be used to improve every fact of business from production to 
human resources to order entry to technical support. 


The six sigma sub-methodologies are given below 


DMAIC It is (Define, Measure, Analyse, Improve, Control) an improvement 
for existing processes falling below specification and thus looking for 
incremental improvement. 


DMADV It is (Define, Measure, Analyse, Design, Verify) an improvement 
system used to develop new processes or products at six sigma quality level. 


Software Maintenance 


It is an activity in which program is modified after it has been put into use. In 
this, usually it is not preferred to apply major software changes to system's 
architecture. 


Types of Software Maintenance 


Corrective 
It focuses to rectify the bugs observed while the system is in use. 


Adaptive 

This kind of maintenance comes into the picture when customer need the 
product to run on new platform, on new operating system or when they 
need the product to be interfaced with new hardware or software. 


Perfective 
This focuses on to enhance the performance of the system. 


Software Re-engineering 


It means re-structuring or rewriting part or all of the system. The software 
re-engineering is needed for the application which require frequency 
maintenance. Advantages of software re-engineering are reduced risk and 
reduced cost. 


Web Technology 


HTML (Hyper Text Markup Language) 


HTML is a method where ordinary text can be converted into hyper text. It is 
basic tools for designing a web page. 


Hyper Text 


A way of creating multimedia documents, also a method for providing links 
within the documents. 


Markup Language 


A method for embedding special tags that describe the structure as well as 
behaviour of a document. 


HTML Tags 
Developing a web page is concerned with 
e Contents of the page e Appearance of the page 


The appearance of the page is coded in HTML language, using HTML 
tags. An attribute of a tag is additional information that is included inside 
the start tag. 
































+ y ¥ | 
attributes 


Tag used for horizontal line 





= Y size|= 5|width|= 50% |align|= right> 
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Types of Tags 
Container tag Empty tag 
(in which bracket contains (stand alone and bracket 
text or other tag elements) do not contain attributes) 
Example of container e.g., <title> </title> <br> 
tag is given above i.e., hr tag break line tag 


with attributes 


Basic Structure of HTML Page 


l 1 
The HTML page can be saved as. html or . htm extension. 
i <HTML> i 
: <HEAD> f 
f //Head elements are to be written here i 
i </HEAD> ; 
i < BODY > i 
i //Body elements are to be written here 
</BODY> 
</HTML> 
l 1 
1 l 
j 1 
l 1 
l 1 
1 1 


Here, the < > is called Starting Tag and </> is called Closing Tag. The page will start 
with HTML tag, which will declare that , this page is an HTML page. 


The HTML page is divided into two major parts. 
1. Head element 2. Body element 


Head Element 


Head element basically contains those tags which are useful to store some 
information about the current HTML page but they are not visible to web 
explorer. Some of head tags are also used to handle some events in HTML 
documents. Head elements are described as follows 


<title> Tag 
This tag is used to define title of the web page. The text that is written in 
between the opening and closing tags of title tag is visible at the topmost 
bar of web page. 

<title> First web page </ title > 


<base > Tag 
Base tag defines the default address or default target for all links on a page. 
< html> 
<head> 
< base href = “http: //ww. abc . com/myfolder”/> 
<base target = “ — blank” /> 
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</head> 
<body> 
< a href = “mypage .htm1”> Click Here < /a> 
< / body> 
</ html> 


Here, <a> </a> defines the link to another page. This tag is also 
known as Anchor Tag. When you click on Click Here, it will redirect you 
to the www.abc.com/myfolder/mypage. html. 


The href element in the base tag defines a base URL for all relative URLS in 
the page and the target element in base tag defines the location, where to 
open all links in a page. 


Attributes of target are as follow 

Target = “_blank” means open the link page to the new window. 
Target = “_parent” means open the link into the same * frame. 
Target = “_self” means open the link into the same * frame. 
Target = “_top” means open the link into the top * frame. 

* frame specified by frame_name 


<link> Tag 
The <link> tag defines the relationship between a document and an 
external resource. 
<head> 
<link rel = “stylesheet” type = “text/css” href = “abc .css”/> 
</head> 


Here, the link tag is used to define that an external css file, /.e., abc.css 
has to be used in this html page. 


Rey Pint Sanaa naa 


The attributes in link element are as follows 

+ Rel Specifies the relationship between the current document and the linked 
document. 

+ Type Specifies the MIME type of the linked document. 

+ Href Specifies the location of the linked document. 

+ Rev Specifies the relationship between the linked document and the current 
document. 

+ Charset Specifies the character encoding of the linked document. 


<Meta>Tag 
This provides metadata about the HTML document. Metadata will not be 
displayed on the page, but will be machine parsable. The <meta> tag 
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always goes inside the <head>. The Metadata can be used by browsers 
(how to display content or reload page), search engines (keywords) or 
other web services. 

< head > 

< meta name = “description” content = “free tutorials”/ > 

< meta name = “keywords” content = “HTML, CSS, XML”/> 

< meta name = “author” content = “Mohan” /> 

< meta http_equiv = “Content_Type” content = “text .html; 

charset =IS0-8859-1”/> 

</head> 


Here, 


e Name Provides a name for the information in the content attribute. 

e Http_equiv Provides an HTTP header for the information in the content 
attribute values for this, attributes are content type, content-style-type, 
expires, refresh, set_cookie. 

e Content Specifies the content of the meta information. 

e Scheme Specifies a scheme to be used to interpret the values of the 
content attribute. 


<script> Tag 
It is used to define client-side script, such as Java script. This script 
element either contains scripting statements or it points to an external 
script file through the src attribute. 
<head> 
<script type = “text/java script”> 
document.write (“Hello World!”) 
</script> 
</head> 


<style> Tag 
Style tag is used to define style information for an HTML document. Inside 
the style element we specify how HTML elements should render in a 
browser. 

<html> 

<head> 

<style type = “text/css”> 

h1 {color : red} 

p {color : blue} 

</style> 

</head> 
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<body> 
<h 1> Header 1 </h 1> 
<p > A paragraph. </p> 
</ body> 
</html> 


<body> Tag 

The body tag contains all those text that is to be displayed in web tag. The 
tag inside the body tag are used to format the text written and images to be 
displayed in web page. It also helps to align the various objects in web 
page. 
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+ The only possible value is “text/css” for this tag. 

+ The style element always goes inside the head sections. 

+ The tag inside the body tag are used to format the text written and images to 
be displayed in web page. 

+ <body> Tag also helps to align the various objects in web page. 


Text Formatting Tags 


e <b>Tag b stands for Bold tag. The text written between <b> and </b> 
tag will be displayed as bold letters. 
e <i>Tagi stands for italic format means the text written between <i> and 
</i> will be displayed as italic format. 
e < big >Tag This tag helps the text to be displayed in bigger size. 
e < small>Tag This tag renders a smaller text. 
e <u> Tag The text written between <u> and </u> will be displayed as 
underlined text. 
e <q> Tag This tag defines the short quotation. The text will be quoted, if 
we use this tags. 
<html> 
<body> 
<q> Here comes a short quotation </q> 
</body> 
</html> 
Output “Here comes a short quotation” 


e <strong> Tag This tag is used to display the text in bold format. 
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<sub> Tag This tag defines subscript text. 


e.g., 

H <sub> 2 </sub> 0 

Output H,0 
<sup> Tag This tag is used to display superscripted text. 
e.g., 


4<sup>th</sup>Edition 

Output 4° Edition 
Headings <h,> ...<hg> Tags The <h,> to <hg> tags are used to 
define HTML headings. <h,> defines the largest and <hg> defines the 
smallest heading. 
<font> Tag This tag is used to describe the font style of a text. 
e.g., 
< font color = “red” size = “16” face = “arial” > This is my 
font tag test </font> 
The color attribute defines the color of the text. The value in the color 
attribute can be given by directly by name or by specifying a particular 
hexadecimal 6 digit number like color = “#FFOEE2” or by RGB 
combination like color = “rgb (50, 50, 50)” size attribute defines the size 
of the text and the face attribute defines the font type of the text. 
<abbr> Tag The <abbr> tag describes an abbreviated phrase. 


de ee 


The <abbr> Tag 
The <abbr title = “World Health Organisation”>WHO </abbr> was founded in 1948. 
Here, when we will move the mouse over WHO it will display 
Output “World Health Organisation” 


<blockquote> Tag This tag defines a long quotation. A browser inserts 
white space before and after blockquote element. It also inserts margins 
for the blockquote element. 

<del> Tag The <del > tag is used to define the text that has been 
deleted from a document. 

e.g., The leader of this class is <del> Rahul </del> Neha. 

Output The leader of this class is Neha. 
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Text Alignment Tags 
There are some tags that are used to align text into left, center, right or 
justified position. 
These tags are described as follows 
(a) <p>Tag This tag is used to define a new paragraph. The attributes in 
paragraph tag is align values of which is right, center, justified or left. 
<p align = “left” > 
(b) <br > Tag This tag is used to start next line or line break . The text 
written after <br> tag will be stated from next line 
e.g., This is my first web page <br> And | learned everything about HTML. 
Output This is my first web page 
And | learned everything about HTML. 


Note that <br> tag has no closing element. 


(c) <pre> Tag As we know that without <br> tag line break is not applied 
to web page. 
Thus, for every alignment corresponding tag is to be defined. <pre> 


tag eliminates this over head. The text written between <pre> tag is 
displayed in web page in the same manner as it is written in coding. 


e.g., 
<pre> 
Hi! my name is Rahul and 
I love web technology. 
</pre> 
Output 
Hi! my name is Rahul and 
I love web technology. 


(d) <center> Tag This tag is used to align a text to the center. This means 
the text that is written between center tag will be displayed on the center 
of page. 


Lists 

HTML supports three types of lists as follows 

e Unordered List (Unnumbered or Bulleted List) 
e Ordered List (Numbered List) 

e Definition List 
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Unordered List 


This defines the element using bullets. 
e.g., 
<body> 
<ul> 
<li> Coffee </li> 
<li> Milk </li> 
<li> Bread </li> 
</ul> 
</body> 
Output 
Coffee 
Milk 
Bread 


Ordered List 
In ordered list, elements are defined in ordered manner i.e., numbered. 
e.g., 
<ol> 
<li> Oranges </li> 
<li> Peaches </li> 
<li> Grapes </li> 
</ol> 
Output 
1. Oranges 
2. Peaches 
3. Grapes 


Definition List 


A definition list is a list of items, with a description of items. The <dl> tag 
defines a definition list . The <dl> tag is used in conjunction with <dt> 
(defines the item in the list) and <dd> (describes the item in the list). 
e.g., 

<dl> 

<dt> CS </dt> 

<dd> - Computer Science </dd> 

<dt> IT</dt> 

<dd>- Information Technology </dd> 

</d1> 


Eea 


Output 
CS 
-Computer Science 
IT 


-Information Technology 


Tables 


Tables are very useful for presentation of tabular information. For this 
purpose, we use <table> tag. 


e.g., 


<table border = “5” > 
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<caption> <b> Student Record </b> </caption> 


<tr> 
<th> Roll No </th> 
<th> Student Name </th> 
</tr> 
<tr> 
<td> 001 </td> 
<td> Ram </td> 
</tr> 
<tr> 
<td> 002 </td> 
<td> Shyam </td> 
</tr> 
</table> 

Output 


Student Record 




















Roll No Student Name 
001 Ram 
002 Shyam 
Table Elements 


Table elements are as follows 
<table> </table> Defines a table in HTML. 


<caption> </caption> Defines caption for the title of the table. Attribute Align = Both 


can be used to position the caption below the table. 


<tr> </tr> Specifies a table row within a table. 


<th> </th> Defines a table header cell. By default the text in this cell is bold and centered. 
<td> </td> Defines the table data cell. By default the text in this cell is aligned left 


and centered vertically. 
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Some of the table attributes are described below 

e bgcolor This is used to define the background color in table. 

e border This specifies width around the border. 

e cellpadding Specifies the space between the cell wall and the cell 
content. 

e cellspacing Specifies the space between cells. 

e width Specifies the width of a table. 

e align This attribute aligns the content of row in three manners /.e., left, 
right, center. 

e valign This attribute aligns the content of a cell in three values for this are 
middle, top and bottom. 

e image The <img> tag allows to display images on web page. Images 
are the binary representation of data in pixel form. 
Inline image Image is inserted at a particular location “in a line” within a 
web page. 

e syntax <img src = “myfolder/abc.jpg” height = “50%” width = “50%” alt = 
“mypics” border = “3” align = “center” > 
Here, the src attribute defines the path of the image. Height defines the 
height of the image that can be given in form of % or pixel value a 
height = “100”. Similarly, width attribute defines the width of the image. 


Rey Pint Sanaa 
+ The alt attribute defines the alternate text that will be displayed, if the image is 
not visible. 
+ Border attribute defines the border width around the image. 


+ Align attribute defines the alignment of the image, values can be left, center, 
right, top, bottom. 


Link 
Links are used to jump from one html page to another html page. <a> 
anchor tag is used to create link between two html pages. 


e.g., < a href = “myfolder/nextpage.html”> Click Here </a> when, we will 
click on the “Click Here” it will redirect the current page to nextpage.html. 


HTML Form 


Forms not only provide communication from user to server, but also 
provide powerful stuff and can add a lot of value to the web pages. 
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Forms are used to get inputs from the user. The user input is submitted to 
the server. A form is defined with <form> ... </form> tag. 


Form tag has three attributes as follows 


Action Attribute 


This attribute specifies where to send the form-data when a form is 
submitted or what action needs to be performed when form is submitted. 


e.g., If we want a program “\cgibin\comments.exe” to be executed when a 
form is submitted, then we write 


action = “/cgi/bin/comments. exe” 


Method Attribute 


This attribute defines how the data will be submitted to the server to go to 
the next page. Two methods are there, get and post. In get method, the 
data of the form is appended in the URL of next page. 


e.g., <form method = “get/post” action = “a. exe” >. 


Encrypt Attribute 
This is used to inform the server the way to handle the encryption process. 
encrypt = “application/x-www. form-urlencoded” 


Elements of a Form 
The elements of a form can be of three types 





e Input box 

e Selection list box or combo box 

e Text area 

Input Box and Type Values 
Input Types Example 

Check Box < input type = “check box” name = CB> 
Radio Button < input type = “radio” name = RB> 
Text Field < input type = “text” name = TF> 
Password < input type = “password” name = Pwd> 
Hidden Field < input type = “Hidden” name = HF> 
Button < input type = “Button” name = Btn> 
Submit Button < input type = “Submit” name = SB> 
Reset Button < input type = “Reset” name = RB> 
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Selection Box (Combo Box) The <select> tag is used to create a select 
list (drop down list). 


The <option> tag inside the select element define the available options in 
the list. 
e.g., 

<select> 

<option value = “volvo” > volvo < /option> 

<option value = “scoda” > scoda < /option> 

<option value = “Mercedes” > mercedes < /option> 

</select> 


Text Area Multiline area in which user can type the text as input. It has 
three attributes as number of rows, number of columns in text area and the 
name of the variable. 
e.g., 

<html> 

<body> 

<center> Enter your text 

<text area Name = “Remarks” rows = 5 column = 50> 

</text area> 

</center> 

</body> 

</html> 


Frames 

Frames allow us to divide a browser window into several independent 

parts. It is a way of displaying two or more pages at once. 

Frames could be applied for the following on the web page as given below 

e To display the log or a stationary information in one fixed portion of the 
page. 

e For the table of contents in a page, where people can just click and move 
around the website without having to more constantly to the contents 
page. 

Different Frames Tag 











Tag Name Description 
<frame> Frame definition Defines a single frame in a frame set 
<frame set> | Frame group definition Container for frame elements 
<iframe> inline frame Defines an inline frame 
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Example for Cols 

< frameset cols = 200, 400> : size is in pixel 
cols specify the number of vertical windows or frames and values in them 
specify the width of the frame in frameset. We can provide values either in 
pixel or in percentage. For percentage we need to mention values with 
% sign. For pixel, we should not give any sign with the value as in above 
example. 


Example for Rows 

Row specifies number of horizontal windows or frames and values in rows 
specify the height of the frame < frameset rows = 100, 40%, *> 

Frame window will have three frames, where the height of the first frame is 
of 100 pixels, the second is 40% of the main window and the height of the 
3rd one will occupy the rest of the window space. 


Attributes of Frame Tag 





Attributes Description 





1. | Frames border (1/0) Renders a 3-D border around the frame. The 
default (1)inserts a border, 0 displays no border. 


2. | Margin height (number /percentage) | Controls the margin height (in pixels or in 
percentage) for the frame. 








3. | Margin width (number / controls the margin width. 
percentage) 
4. | Name = “Text” Provides a target name for frames 
5. | No resize Prevents the user from resising the frame. 
6. | Scrolling (yes/no/auto) Creates a scrolling frame. 
7. | SRC = “URL” Displays the source file for the frame. 
e.g, 


<frameset rows = “50%”, “50%”> 

<frame set cols = “50%”, “50%”> 

<frame src =A. html> 

<frame src =B.html> Output 
</frameset> 

<frame cols = “50%”, “50%”> 
<frame src =C.html> 
<frame src = D.html> 

</ frameset> 





XML 


XML stands for Extensible Markup Language. XML is designed to transport 
and store data but HTML is designed to display data. 
XML tags are not predefined. You must define your own tags. 
e.g., 

<?xml version = ‘‘1.0’” 

encoding = ‘‘ISO-8859-1’'?> 

<note> 

<to>Tove</to> 

<from Jani</from> 

<heading>Reminder</heading> 

<body>Do not forget me this weekend !<body> 

</note> 
The first line is the XML declaration. It defines the XML version (1.0) and the 
encoding used (ISO-8859-1 =Latin-1/West European character set). 
The next line describes the root element of the document (like saying: “this 
document is a note”) The next 4 lines describe 4 child elements of the root 
(to, from, heading and body) and finally the last line defines the end of the 
root element. 


-Key POINTS iirinn 


+ XML is used in many aspects of web development, often to simplify data 
storage and sharing. 

+ XML simplifies data sharing because in the real world, computer systems and 
databases contain data in incompatible formats. 

+ XML documents from a tree structure that starts at “the root’ and branches to 
“the leaves”. 


Difference between XML and HTML 
XML HTML 


1. XML is designed to describe data and | HTML is designed to display data and to 
to focus on what data is, XML usesa | focus on how data looks. 

document type definition or XML 
schema to describe the data. 

2. | XML tags are not predefined. We The tags used to markup HTML 

need to define our own tags. documents and the structure of HTML 
documents both are predefined. 


3. | In XML, data is stored in a separate In HTML, data is stored inside the HTML. 
XML file. 


4. | XML is about carrying information. | HTML is about displaying information. 
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Applications of XML 


e Refined Search Results With XML specific tags, search engines can 
give users more refined search results. A search seeks the term in the 
tags, rather than the entire documents. 

e EDI Transactions XML allows data to be exchanged, regardless of the 
computing system. 

e File Converters Many applications have been written to convert existing 
document into the XML standards. 


XML Syntax Rules 
<?XML version = “1.0” encoding = “ISO-8859-1"?> 
<Student> 
<Name> RAM < /Name > 
<Roll No> 04</Ro11 No> 
</Student> 
e XML Declaration The XML page is declared as < ? XML version = “1.0” 
encoding = “ISO - 8859 - 1°? >. The declaration must be included in XML 
page. It defines the XML version and the character encoding used in the 
documents. 
e Root Element The XML root element is defined as a tag in which all the 
other tags are included in our example <student> is root tag. 
e Syntax rules of XML page are given below 
(i) All XML elements must have a closing tag. 
(ii) XML tags are case sensitive. 
(iii) XML tags must be properly nested. 
(iv) XML documents must have a root element. 
(v) Always quote the XML attribute values. 


XML Schema 


XML schema is a definition language for describing the structure of XML 
document. 


Elements of XML Schema 

e <XS: element> It is used to define elements. 

e <XS: attribute> It is used to define individual attribute. 

e <XS: simple type> Marks the beginning of type that has no attributes 
and only text contents. 

e <XS:choice> Describes the set of subelements an element can contain 

e <XS:sequence> Describes the set of subelements in a particular order. 
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Valid XML Document 


An XML document is said to be valid, if it has a Document Type 
Definition (DTD) or XML schema associated with it and if the document 
complies with it. 


Well Formed XML Document 


XML document with correct syntax is called well formed XML document. 


DTD (Document Type Definition) 
A document type definition defines the legal building blocks of an XML 
document. It defines the document structure with a list of legal elements 
and attributes. A DTD can be defined inside a XML document or an 
external reference can be declared. 
e Internal DTD If the DTD is defined inside the XML document, it should be 
wrapped in a DOCTYPE definition with the following syntax. 
< ! DOCTYPE root element [element declarations] > 
e External DTD If the DTD is defined in an external file, it should be 
wrapped in a DOCTYPE definition with the following ‘syntax’: 
<!DOCTYPE root element system “filename” > 





Importance of DTD 


1. With a DTD, an XML file carries a description of its own format. 

2. With a DTD, Independent groups of people can agree to use a standard 
DTD for interchanging data. 

3. DTD can be used to verify data. 


XML Parser 


To use XML document in an application, we need to parse it. An XML 
parser read an XML document and separates it into start tags, attributes 
and end tags. Two types of XML parser are 

1. DOM (Document Object Model) 2. SAX (Simple API for XML) 


DOM (Document Object Model) 


The DOM parser presents an XML document as a tree structure with 
the elements, attributes and text defined as nodes. The DOM provides an 
XML document into a collection of objects in an object model in a tree 
structure. 
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e Benefits of using DOM DOM makes easier to modify data, to remove it 
or to insert new one. 

e Advantages of using DOM DOM is very useful when a document is small. 

e Application DOM model is useful for interactive applications. 





SAX (Simple API for XML) EEA 


SAX works in serial access modes to ¥ J Y 1 
parse XML documents. SAX is also DTD Content Error Entity 
called as an event driven protocol handler handler handler resolver 
because it implements the technique to register the handler to invoke the 
callback methods whenever an event is generated. 





-Key PS ee tee niss 
+ SAX provides a mechanism for reading data from an XML document that is an 
alternative to that provided by the DOM. 


+ Where the DOM operates on the document as a whole, SAX parser operate 
on each piece of the XML document sequentially. 


+ DOM provides access to the information stored in our XML documents as a 
hierarchical object model. 


Sample XML Document 
< ? XML Version = “1.0”?> 
<Book> 
<Book Name> WebTech</BookName> 
<Author> 
<Author 1> RAM</Author 1> 
<Author 2>Shyam</Author 2> 
</Author> 
<Publisher> Arihant</Publisher> 
<Price>< 500</Price> 
</Book> 


SAX Approach for the Sample XML Document 


= End Element Price » End Element Book 


1 I 
= Start Element Book = Start Element BookName i 
| = End Element Book Name = Start Element Author 
| = Start Element Author 1 = End Element Author 1 
= Start Element Author 2 = End Element Author 2 
! = End Element Author a Start Element Publisher i 
| a End Element Publisher = Start Element Price 
1 i 


Handbook Computer Science & IT 345 


DOM Approach for the Sample Document XML 
Book 


t 


Book name Author Price Publisher 


Author 1 Author 2 
Representation way of XML document 


Client/Server Model of Computing 


In the client/server computing, all clients communicate with a server in the 
computer network. Both the clients and server are the nodes 
(communication points) on the network. The arrangement of the nodes in a 
network is called the network topology. In client/server computing model, 
there are two computer which play an important role 


(1) Client (2) Server 


Client/Server Definition 


Clients are personal computers or workstations on which users run 
applications. A network architecture is that in which each computer 
or process on the network is either a client or a server. Servers 
are powerful computers or processes dedicated to disk drives (file 
servers), printers. 

Client/server model is a form of distributed computing where one 
program (the client) communicates with another program (the server) for 
the purpose of exchanging information. 


Client/Server Architecture 


A server is a computer system that selectively shares its resources; a client 
is a computer or computer program that initiates contact with a server in 
order to make use of a resource. 
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The Client/server architectures are sometimes called 2-tier architecture. 















































ENE) 
- Request 
— 
ooo 
Response 
Client Server 


The client's responsibilities are usually to 

e Handle the user interface. 

e Translate the user’s request into the desired protocol. 
e Send the request of the server. 

e Wait for the server’s response. 

e Translate the response into human readable format. 
e Present the results to the user. 

The server's functions include 

e Listen for a client’s query/request. 

e Process that query/request. 

e Return the results back to the client. 

A typical client/server interaction goes like this 

e The user runs client software to create a query. 

e The client connects to the server. 

e The client sends the query to the server. 

e The server analyses the query. 

e The server computes the results of the query. 

e The server sends the results to the client. 

e The client presents the results to the user. 

e Repeat as necessary. 


a 


+ Specific types of clients include web browsers, e-mail clients and the online 
chat clients. 

+ Specific types of servers include web servers, FTP servers, application servers, 
database servers, mail servers, file servers, print servers and terminal servers. 

+ Most web services are also types of servers. 


Scripting Language 


A scripting language, script language or extension language is a 
programming language that allows control of one or more applications. 
Scripts are distinct from the core code of the application, as they are 
usually written in a different language and are often created or atleast 
modified by the end user. Scripts are often interpreted from source code or 
byte code whereas the application is typically first compiled to native 
machine code. 


Java Script 

e A scripting language /.e., a light weight programming language. 
e Designed to add interactivity to HTML pages. 

e Usually embedded directly into HTML pages. 

e Interpreted language (execute without preliminary compilation) 


Key Pints nnn 


+ Java and Java script are not same in both concept and design. 
+ Java is a powerful and much more complex programming language. 


Advantages of Java Script 
The advantages of Java script are as given below 

(i) Java script gives HTML designers a programming tool. 

(ii) Java script can put dynamic content into an HTML page. Statement 
like document.write (“< h1>” + name + “ </h1>”) can write a 
variable text into an HTML page. 

(iii) Java script can react to events. 

(iv) Jave script can read and write HTML elements. 


(v) Java script can be used to validate data before it is submitted to a 
server. 


(vi) Java script can be used to detect the visitors’s browser. 
(vii) Java script can be used to create cookies. 
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Loops in Java Script 


Loops in Java script are used to execute the same block of code a 
specified number of times or while a specified condition is true. 


eg., 


< html> 
<body> 
<script type = “text/javascript’’> 
var i=0; 
for (i=0; i<=10; i++) 
{ 


document . write (“The number is;” +i); 
document . write (“</br>"); 
} 
</script> 
</body> 
</html> 
In same way, we can use while loop for this. 
e.g., 
for (variable in objects) 
{ 
code to be executed 


} 


The for in statement is used to loop/iterate through the elements of an array 
or through the properties of an object. Here, variable can be a named 
variable, an array element or a property of an object. 
<html> 
<body> 
<script type = “text/javascript”> 
var x; 
var mycars = new Array [ ] ; 
mycars [0] = “Audi” 
mycars [1] = “BMW”; for (x in mycars) 


document. write (mycars [x] + “<br/>”); 


</script> 
</body> 
</html> 
Output Audi 
BMW 
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Java Script Functions 


e To keep the browser free from executing a script when the page loads, we 
can put our script into a function. 


e A function contains code that will be executed by an event or by a call to 
that function. 


e Functions can be defined in both the <head> and in the <body> section 
of a document. However, to assure that the function is read/loaded by the 
browser before it is called, it could be wise to put it in the <head> section. 


e.g., 
<html> 
<head> 
<script type = “text/javascript” > 
function displaymessage ( ) 


{ 
alert (“All the Best”); 
} 
</script> 
</head> 
<body> 
<form> 
<input type = “button” value = “click me” onclick = 
“displaymessage ( )”> 
</form> 
</body> 
</html> 


-Key Points 2 


+ Java script is case sensitive In java script “my function” and “My Function” 
are not same. 
+ White space Java scripts ignores extra spaces. We can add white spaces to 
our script to make it more readable. Following lines are same 
name = “Arihant”; 
name = “Arihant”; 
+ Breaking up a code line We can breakup a code line within a text string by 
using a backslash“\”. 
e.g., 
document. write (“Hello World !”); 
However, we can’t breakup a code line like this: 
documents write \ (“Hello World !”); 
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HTML Comments to Handle Simple Browsers 


Browsers that do not support Java script will display Java script as page 
content. To prevent them from doing this and as a part of the Java script 
standard, the HTML comment can be used to hide the Java script. 
<script type = “text/javascript”> 
<li... 
document . write (“Hello”); 
//.. ></script> 


Document . Write () It is a standard Java script command for writing 
output to a page or displaying text to the user. 


The return Statement 


The return statement is used to specify the value that is returned from the 
function. 
e.g., 
<html> 
<head> 
<script type = “text/javascript”> 
function product (a, b) 
{ 
varx =a* b; 
return x; 
} 
</script> 
</head> 
<body> 
<script type = “text/javascript”> 
var a = product (2, 3); 
document. write (“product is =” +a); 
</script> 
</body> 
</html> 
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Conditional Statements 
We can refer the below example to understand this. 
e.g., 
<html> 
<head> 
<body> 
<script type = “text/javascript” > 
var d =newdate ( ); 
var time =d getHours ( ) ; 
if (time < 10) 
{ 
document. write (“< b> Good Morning </b>”); 
J 
else if (time > 10 && time < 16>) 
{ 
document. write (“< b> Good Day </b>”); 
} 
else 
{ 
document. write (“< b> Hello world ! </b >”); 
} 
</script> 
</body> 
</head> 
</html> 


Java Script Events 

Events are action that can be detected by Java script. Every element on a 
web page has certain events, which can trigger Java script functions. 

We define the events in the HTML tags. 


Example of Events 


b) A web page or an image loading 

(c) Mouse over a hot spot on the web page 
d) Selecting an input box in an HTML form 
e) Submitting an HTML form 

(f) A key stroke 


Events are normally used in combination with functions and the function 
will not be executed before the event occurs. 


(a) A mouse click 
( 
( 
( 
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Onload and Unload 

These events are triggered when the user enters or leaves the page. The 
onload event is often used to check the visitor's browser type and browser 
version and load the proper version of the web page based on the 
information. 


Onsubmit 
This event is used to validate all form fields before submitting it. 
<form method = “post” action = “abc. html” 
onsubmit = “return check From ()”> 


Function check Form ( ) returns either true or false. If it returns true, the 
form will be submitted, otherwise the submission will be cancelled. 


Java Script switch Statement 

As we have seen we can use if...else conditional statements, loops, in 
same way we can use switch statement in Java script to select one of many 
blocks of code to be executed. 


Switching Theory and 
Computer Architecture 


Number System 


This basically means the presentation of number with limited number of 
symbols. The decimal number system is said to be of base or radix 10 
because it uses ten digits (0-9) to represent any number. 
To distinguish between numbers of different bases, we enclose the value in 
paranthesis and with a subscript equal to the base is used. 
e.g., If 10 is a decimal value, then (10)4o 
If 10 is a binary value, then (10)> 


radix 


(7182)49 = xf + Mx + @®x{id|'+ Oxf" Base or 


Coefficients 


(01101 = (@)x 24 +x 23 Ox 22 +@)x 2'+@x 2 


Decimal to binary and binary to decimal 


So, a number system with radix r will have r symbols and any number will 
be interpreted as 


1 


ant +a yr kati 
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Basic Number Systems 




















Number System Symbols Used Base or Radix 
Binary 0,1 2 
Octal 0,1,2,3,4,5,6,7 8 
Decimal 0,1,2,3,4,5,6,7,8,9 10 
Hexa decimal 0,1,2,3,4,5,6,7,8,9, 16 
A,B,C,D,E,F 
-Key POINTS iiini 


+ The symbols (a„, a,_1,.--, 2g) should be one of the r symbols allowed in the 
number system (i.e., from O to r- 1), In this, a, is called as most significant 
digit and ag is called the least significant digit. 


Number Base Conversions 


Decimal to Binary 


We can understand this with the help of an example, converting decimal 
31 to binary. 


Decimal to Binary Conversion 


























Integer Quotient Remainder Coefficient 

wh oié 1 a =1 

2 

13 _5 1 a,=1 

2 

fa 1 a, =1 

2 

3 Ly 1 ay =1 

2 

te 0 1 a,=1 

2 

The binary number will be written as (a4 a3 a> a; &ọ)ọ ie., (11111)o. 


Process 


The given number should be divided by 2 to give an integer quotient and a 
remainder. The quotient is again divided by 2 to give a new quotient and 
remainder. The process is continued, until the integer quotient becomes 0. 
The coefficient of the desired binary number are obtained from the 
remainder as shown in the table. 
e.g., For fractional part 

Convert (0.1245),, to binary upto 6 bits. 


o9 
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Process 

First (0.1245)4ọ is multiplied by 2 to give an integer and a fraction. The new 
fraction is multiplied by 2 to give a new integer and a new fraction. This 
process is continued until the fraction becomes 0 or until the number of 
digits has sufficient accuracy. The coefficient of binary number are 


obtained from the integers as shown in the table. 


Coefficient of Binary Number 




















Fraction x 2 Fraction Integer Coefficient 
0.1245 x 2 = 0.249 0.249 0 a_,=0 
0.249 x 2 = 0.498 0.498 0 a,=0 
0.498 x 2 = 0.996 0.996 0 a_,=0 
0.996 x 2 = 1.992 0.992 1 a_4=1 
0.992 x 2 = 1.984 0.984 1 a; =l 
0.984 x 2 = 1.968 0.968 1 a=l 











Thus, 
Le., 


(01245), = (0.000111), 
(a 2p 23 4 a5 a-6)2 


Decimal to Octal Conversion 

We follow the same procedure as in case of decimal to binary except the 
base is changed now /.e., instead of 2 it is 8 now. So, wherever we have 
used 2 for multiplication or for division (in case of decimal to binary) we will 
use 8 in case of octal conversion. 








e.g., (4110 = (51) 
Decimal to Octal Conversion 
Integer Quotient Remainder Coefficient 
4l 5 1 a =1 
8 
2m 0 5 a,=5 
8 








Binary to Octal or Hexadecimal Conversion 


Since, 2° = 8 and 24 =16, each octal digit corresponds to three binary 


digits and each hexadecimal digit corresponds to four binary digits. The 
conversion from binary to octal is easily accomplished by partitioning the 
binary numbers into group of 3 digits each as given below. 


(10 110 001 101 . 011 111 100 000 10 ),= (26153.7402) g 
LJ | i ii J l L Jl JL ie E 
2 6 1 5 3 7 4 0 


— 





—— 
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Conversion, from binary to hexadecimal is similar, except the binary 
number is divided into group of 4 digits. 


(10 1100 0110 1011 . 1111 0000 010)= (2C6B.F02 ) 
E o S S S ea = a 
2 12=C 6 11=B 15=F 0 2 


-Key POINTS iirinn 


+ Partitioning is done from right to left for integer part and from left to right for 
fractional part. 


Binary to Decimal Conversion 
001001-0 1 0,=(@)ig 
E 42 
6543210123 
=1x2°+0x2°+0x244+1x27+0x274+0x2'+1x 2° 








Ox2'+1x2741x23 








Octal to Decimal Conversion 


The same technique is applied over here as applied in case of binary to 
decimal except here our radix is 8, SO we use powers of 8. 


Complements 


Complements are used in digital computer system for simplifying the 
substraction operation and for logical manipulation. 


There are two types of complements for each base r system, 
1. Ther’ s complement 2. The (r — 1)’ s complement 


The r’s Complement 

Given a positive number N with base r with an integer part of n digits. The 
r’s complement of N is defined as r” — N for N # Oand 0 for N =0. 

e.g., 10's complement of (25.639),, is (10° — 25639) 


100 — 25639 = 74 — 361, here the number of digits in integer part is 2 
means n =2. 
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The (r —1)’s Complement 
Given, a positive number N in base r with an integer part of n digits and a 
fraction part of m digits, the (r — 1)'s complement of N is defined as 
r-r" HN, 
e 1's complement of (52520) ;, is (10° — 1- 52520) 
= 99999 — 52520 = 47479 
Because number of integer part is 5, so r” = 10° and no fractional part is 
present’ sor” =10° =1 
e 1's complement of (0.3267) ,9 is (10° — 1074 — 0.3267) 
= 1- 0.0001 - 0.3267 
= 0.9999 - 0.3267 = 06732 
No integer part, so 10” =10° =1 
e 1's complement of (101100), is (2° — 2~°),, — (101100). 
= (64 — 1)49 — (101100), 
= (63)4) = (101100). 
= 111111- 101100 = 010011 


-Key POINTS iirinn 


+ 520 —> Here, n =3, but (052) > heren =2. 
+ In the later example 0 is of no significance. 


Alternative Conversion Process 


For 1’s Complement 
Replace 0's by 1’s and 1’s by O's. 
e.g., 1’s complement of 
(1 001 1 0), = (011001), 
eo bea ae ab ob 
011001 
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For 2’s Complement 
To start at the Least Significant Bit (LSB) and copy all the zeroes (working 
from LSB towards the most significant bit) until the first 1 is reached, then 
copy that 1 and take complement all the remaining bits. 
e.g., 2’s complement of 
(1 0 0 1 0); 


4 4 4 4 4 J first 1, when moving from left to right. 
(011 01 0} 


Properties of 2’s Complement 
2's complement representation allows the use of binary arithmetic 
operations on signed integers, yielding the current 2’s complement result. 


1 

2’s Complement of Positive and Negative Number 

| = Positive Numbers Positive 2’s complement numbers are represented as the simple 
; binary. 

| = Negative Numbers Negative 2’s complement numbers are represented as the 
| binary number that when added to a positive number of the same magnitude 
ı equals zero. 


-Key PONS sees 


+ The most significant (left most) bit indicates the sign of the integer, therefore, 
it is sometimes called the sign bit. If the sign bit is zero, then the number is 
greater than or equal to zero or positive. 

+ Ifthe sign bit is one, then the number is less than zero or negative. 


Subtraction Using 1’s Complement 


e.g., (19-3)=? 
(19)49 = (0001 0011), 
(3)49 =(0000 0011), 


Now we take 1’s complement of 3 i.e., (1111 1100). 
Now, we add binary equivalent of 19 and 1’s complement of 3. 
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0001 0011 
+ 1111 1100 
This is carry bit 1 0000 1111 


1 
(00010000) 5 = (16) 49 
We need to add carry bit in case of 1’s complement method. 
e.g., (3-19)=? 
(3)49 = (0000 0011), 
(19)49 = (0001 0011), 
Now, we take 1’s complement of 19 i.e., (1110 1100)» 
Now, we add binary equivalent of 3 and 1’s complement of 19. 
0000 0011— Binary equivalent of 19 
1110 1100 > 2's complement of 3 


1110 1111 





Here, we don't find any carry bit, so in this case we need to take 1's 
complement of the sum and put a negative sign with it. This will be our 
final answer. 


(1110 1111)> > 1’s complement is (0001 0000), = (16)45 
But we should a negative sign so —16 is our answer. 


Subraction Using 2’s Complement 
e.g., (19-3)=? 
(19)49 = (0001 0011), 
and (3)49 = (0000 0011), 
2's complement of 3 = (1111 1101), 
Add binary equivalent of 19 and 2’s complement of 3 
0001 0011 — Binary equivalent of 19 
+1111 1101 — 2’ s complement of 3 
1001 0000 


So, result is (00010000), = (16) 4, 
e.g., (3=19)=? 
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0001 0011 — Binary equivalent of 3 
+ 1111 1101 > 2’ s complement of 19 
1001 0000 
Since, no carry bit is there, we need to take 2’s complement of the sum. 
This will be our final result with a negative sign. 
2's complement of (11110000) = (0001 0000), = (16), 
So, result will be -16. 


Seca acl eals cease oeosece Soa cbe Sas ese Seb ees Seaae seees os Seas say 


Example of 2’s Complement 
Find out (— 3 — 19) using 2’s complement. This is an special case. 
Here, we need to add 2’s complement of 3 and 2’s complement of 19. 


l 1 
i i 
i i 
1111 1101 —> 2's complement of 3 
| This 1 is carry but 1 +1110 1101 —> 2's complement of 19 ' 
1110 1010 
3 | 
i 1 
l 1 
l 1 
l 1 


Drop the carry and take the 2’s complement of result because here both the numbers 
are negative. So, the result will be 2’s complement of (1110 1010, ie., 
(0001 0110, = (22) 10, additionaly we need to put a — (minus) sign with the result i.e., 
— 22. This will be our final result. 


Binary Codes 


Binary codes are codes which are represented in binary system with 
modification from original ones. 


Weighted Binary System 


Weighted binary codes are those which obey the positional weighting 
principles, each position of a number represents a specific weight. 


EO; 8421, 2421, 5211. 


Sequential Code 


A code is said to be sequential when two subsequent codes, seen as 
numbers in binary representation, differ by one. The 8421 and excess-3 
codes are sequential, whereas 2421 and 5211 codes are not. 


Non-weighted Codes 


Non-weighted codes are codes that are not positionally weighted. That is 
each position within the binary number is not assigned a fixed value. 
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Reflective Code 


A code is said to be reflective when code for 9 is complement for the code 
for 0 and so is for 8 and 1 codes, 7 and 2, 6 and 3, 5 and 4. Codes 2421, 
5211 and excess-3 are reflective, whereas the 8421 code is not. 


BCD (Binary Coded Decimal) 

It is a straight assignment of the binary equivalent. To encode a decimal 
number using the common BCD encoding. Each decimal digit is stored in 
a 4-bit number. 





Decimal | 0 1 2 3 4 5 6 7 8 9 
BCD 0000 | 0001 | 0010 | 0011 | 0100 | 0101 0110 | 0111 | 1000 | 1001 






































BCD encoding for number 127 would be 
1 2 7 
(0001 0010 0111)—> BCD equivalent of 127 
whereas the pure binary number would be (01111111), 


BCD Addition 
Add (148 + 157) =? 








{48 =y 0001 0100 1000 
157 =B 0001 0101 0111 
0010 1001 1111 
1#—_ 0110 0110 
0011 \ 14 —10101 

10000 
Answer 3 0 5 


When sum of 2 digits is greater than or equal to 9, then we need to add 
6i.e., 0110. 


2421 Code 


This is a weighted code, its weights are 2,4,2 and 1. A decimal number is 
represented in 4 bit form and the total 4 bits weight is 
24+44241=9. 


Hence, 2421 code represents the decimal numbers from 0 to 9. 





Decimal | 0 1 2 3 4 5 6 7 8 9 
2421 | 0000 | 0001 | 0010 | 0011 | 0100 | 1011 | 1100 | 1101 | 1110 | 1111 
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Excess-3 Code 


Excess-3 is a non-weighted code used to represent decimal numbers. The 
code derives its name from the fact that each binary code is the 
corresponding 8421 code plus 0011 (3). 





e.g., Decimal 8421 Excess-3 
8 1000 1000+0011=1011 
6 0110 0110+0011 = 1001 
Gray Code 


This is a variable weighted code and is cyclic. This means that it is 
arranged, so that every transition from one value to the next value involves 
only one bit change. 


Binary to Gray Code Conversion 
1. Write down the number in binary codes. 


2. The Most Significant Bit (MSB) of the gray code will be same as the 
MSB of binary code. 


3. Perform XOR operation on MSB and next bit to the MSB in binary number. 


4. Repeat step 3 till all bits of binary number have been XORed, the 
resultant code is the gray code equivalent to binary code. 


Binary Code 1 0 1 0 0 1 
cory] “8 Yeh ee ee 
Gray Code 1 0 1 1 1 0 1 
Conversion of binary code to gray code 
Gray Code to Binary Conversion 
1. Start with the MSB of gray coded number. 
2. Copy this bit as the MSB of the binary number. 


3. Now, perform Ex-OR operation of this bit with the next bit of the binary 
number. 


4. Repeat step 3 till all bits of gray coded number have been used in XOR 
operation. The resultant number is the binary equivalent of the gray number. 


Gray Code 1 0 1 1 1 0 1 

o] E E E A 

Binary Code 1 1 0 1 0 0 1 
Conversion of gray code to binary code 
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Fixed Point Representation 


Because of computer hardware limitation everything including the sign of 
number has to be represented either by O's or 1’s. 


So, for a positive number the leftmost bit or sign bit is always 0 and for a 
negative number the sign bit should be 1. 


Representation of Integers 
The are three possible ways to represent a number 
(i) Signed magnitude method 
(ii) 1’s complement method 
(iii) 2’s complement method 


Signed Magnitude Method 


Number is divided into two parts, one is sign bit and other part for 
magnitude, In example we are using 5 bit register to represent — 6 and + 6. 
































1Jo] 1] 1J0]-6 [o]o] 1] 1[0]+6 
=n << 

t Magnitude j Magnitude 

Sign Sign 

bit bit 


Range of Number Forn bit register, MSB will be a sign bit and (n — 1) bits 
will be magnitude. 


ee fees ers | 
-(20-1_4 ) —0+0 (2-1-4 ) 


j 


Lowest Largest positive 
negative number 
number 


-Key Points 3 


+ Drawback of signed magnitude method is that O will be having 2 different 
representation one will be 10000 i.e., — 0 and the other one will be 00000 +0. 


1’s Complement Method 


Positive numbers are represented in same way as in sign magnitude 
method. If number is negative, then it is represented using 1's complement 
method. For this, we first need to represent the number with positive sign 
and then take 1’s complement of this number. 
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e.g., Suppose we are using 5 bit register. The representation of — 6 will 





be as below. 
+6]0/0/1/1/0} > | 1]1] 0] 0] 1]-6 (ie, 1's 
t i complement 
of +6) 
Sign Sign 
bit bit 
m 
(214) -0+0 ra 
Lowest Largest 
negative number positive number 
(when using (when usingk bit register) 
k bit register) 
1’s Complement method 
-Key PGS nn 


+ 


The only drawback of 1’s complement method is that there are two different 
representation for zero, one is — 0 and other is + 0. 


2’s Complement Method 


Positive numbers are represented in same way as in sign magnitude. For 
representing negative number, we take 2’s complement of the 
corresponding positive number. 


+6] 0/0] 1} 1{0} > | 1]1] 0] 1{0]-6(/e., 2’s 





i t Complement 
of +6) 

Sign Sign 

bit bit 


I t | 
-(2" =I) 0 (2-11) 





For n bit For n bit 
register lowest register largest 
negativenumber positive number 


2’s Complement method 


Floating Point Representation 


A floating point number can be represented using two points. First is called 
mantissa (m) and other one is exponent (e). Thus, in a number system with 
base r, a floating point number with mantissa m and exponent e will be 
represented as (m x rê). 
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Exponent Mantissa 





Sign bit Biased form 

The value of m may be a fraction or an integer. Thus, a number (2 .25),, can 
be represented as 0.225 x 10". 

Here, m= 225 ande =1,r=10. 

For n bit register, MSB will be sign bit and (n — 1) bits will be magnitude. 

So, positive largest number that can be stored is (271 — 1) and negative 

lowest number is — (2™7! — 1). 

- (2"*-1) (21-1) 
! t+ 


—0 +0 fe 
Negative lowest Positive largest 
number number 





Actual Number Finding Technique 


Here, we always store exponent in positive. Biased number is also called excess 
number. Since, exponent is stored in biased form, so bias number is added to the 
actual exponent of the given number. Actual number can be calculated from the 
contents of the registers by using following formula 


l 

1 

l 

1 

l 

Actual number =(-1) §(1 + m) x 2% Bias 

i S = Sign bit 

m = Mantissa value of register 

e = Exponent value of register 

Bias = Bias number of n bits used to represent exponent, then 
i Bias number = (2™! — 1) 

l 


Range of exponent = — (241 1) to 2k) 


Boolean Algebra 


Boolean algebra is an algebric structure defined on a set of elements 
together with two binary operators (+) and (.) 


Closure 
For any x and y in the alphabet A, x + y and x.y are also in A. 


Duality 

If an expression contains only the operations AND, OR and NOT. Then, the 
dual of that expression is obtained by replacing each AND by OR, each OR 
by AND, all occurrences of 1 by 0 and all occurrences of 0 by 1. 


366 Switching Theory and Computer Architecture 


Table of Some Basic Theorems 








Law/Theorem Law of Addition Law of Multiplication 
Identity Law x+0=x x-l=x 
Complement Law x+x’=1 x-x’=0 
Idempotent Law x+x=x X:X=X 
Dominant Law xt tai x-0=0 
Involution Law (xy =x 
Commutative Law x+ y=y+x xX y=y:xX 
Associative Law x+(y+z)=(x+y)+z x- (y-z)=(x- y) z 
Distributive Law x (y+tzj)=x ytz |xty-z=(xt y)-(x+ Zz) 
Demorgan’s Law (x+ yy =x y (x yy =x + y 
Absorption Law x+(x-y)=x x (x+ y)=x 





Boolean Value The value of Boolean variable can be either 1 or 0. 


-Key Points =a 


+ Principle of duality is useful in determining the complement of a function. 


Boolean Operators 


There are four Boolean operators 
1. AND () operator (A .B) 2. OR (+) operator (A + B) 
3. NOT (A / A’) operator 
4. XOR (@) operator (A+ B=A-B+A-B) 


Operator Precedence 
The operator for evaluating Boolean expression is 
1. Paranthesis 2. AND 3. NOT 4. OR. 


Boolean Function 

e A Boolean function is an expression formed with binary variables, the binary 
operators (+, .), the unary operator (—), paranthesis and equal sign. 

e For a given value of variable, the function can take only one value either 
Oor1. 

e A Boolean function can be shown by a truth table. To show a function in a 
truth table we need a list of the 2” combinations of 1’s and 0's of the n 
binary variables and a column showing the combinations for which the 
function is equal to 1 or 0. So, the table will have 2” rows and columns for 
each input variable and the final output. 


Handbook Computer Science & IT 367 


Canonical and Standard Form 


e A canonical form is a unique representation for a given symbolic 
expression. If two expressions have the same canonical form, they must 
represent the same function. 

e Boolean functions expressed as a sum of minterms /.e., sum of products. 

e A minterm is a special product of literals in which each input variable 
appears exactly once. Function with n variable has 2” minterms. 

e A maxterm is represented as sum of variable. n variables can be 
combined to form 2” maxterms. Each maxterm is obtained from OR of the 
n variables, with each variable being unprimed, if the corresponding bit of 
the binary number is 0 and primed, if the binary number is 1. 


Wherever value of function F, is 1, corresponding minterm will be considered. 


Rey Pint Sanaa aaa 


+ Two functions of n binary variables are said to be equal, if they have same 
value for all possible 2” combinations of then variables. 


Minterm and Maxterm for a Function 
































F, Function |x |y |z | Minterm | Shorthandfor | Maxterm | Shorthand 
Minterms for Maxterms 
0 0/00 Xyz Mo (X+ V+ Z) Mo 
0 0/0; 1 XYZ mM, (X+ y+ 2) M, 
1 0/;1/0 XYZ mM, (x+y+ 2) M, 
0 O/;1/1 X yz mM; (x+ y+ Z) M, 
1 110/0 Xy Z m, (X + y+ 2) M, 
0 1|0} 1 Xyz Ms (X + y+ Z) M; 
1 HA EO XYZ Me (X + y+ 2) Ms 
1 LS a XyZ m+ (X+ yt 2) M, 









































Representation of Boolean Function 
using Minterm 
A Boolean function may be expressed algebrically form given truth table by 
forming a minterm for each combination of the variable which produces a 
1 in the function and taking the OR of all those terms. We can refer the 
preceding table. 

Thus, function A will be written as 


Fy = XYZ + XYZ + XYZ + XyZ 
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Here, we have selected only those minterms for which 

F,=1and F, will be represented (using minterm) 

R = (2, 4,6, 7) = Xm, M4, Mg, mz) 
As, FA = 1for 010, 100, 110 and 111 those decimal equivalents are 2, 4, 
6 and 7, respectively. 


Representation of Boolean Function 
using Maxterm 


For maxterm representation use those maxterms for which F, = O/.e., which 
produce 0 in the function. Then, form the AND of all those maxterm. 


Thus, f will be given by the below Boolean expression 





Fy=(X+ YZ). (Xt ytZ).(x+y+2Z).(X+ytZ) 
= My M; M3 Ms 
F, will be represented as 
F, = 11(0, 1, 3, 5) 


Karnaugh Map 
e K-map is used to simplify the Boolean expression. 


e K-map provides a pictorial method of grouping together, expressions with 
common factors of their, for eliminating unwanted variables. 





2 variable K-map 


K-map for function F (a,b) = ab + ab 





01 11 10 
0OO/ABCDIABCD|ABCD|ABCD 


10;|ABCD|ABCD|ABCD|ABCD 



























Four variable K-map 
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Rules for K-map 
= Adjacent squares that have 1’s are combined in groups of 2, 4 or 8 allowing the 
removal of 1, 2 or 3 variable from a term. 


= The maps are considered to “Warp around ”, so that the top and bottom 
corresponding squares are adjacent and the left and right squares are also adjacent. 


= Make largest groups and use each square only once. 
= No redundant and unnecessary group are allowed. 








Group 1 





Group 4 





Don’t Care Condition 


e There are applications where certain combinations of input variables 
never occur. We don't care what the function output is to be for these 
combinations of the variables because they are guaranteed never to 
occur. 


e A don't care combination is marked with an X. When choosing adjacent 
squares to simplify the function in the map, the X’s may be assumed to be 
either 0 or 1, whichever gives the simplest expression. 

e In addition, an X need not be used at all, if it don’t contribute to cover a 
larger area. In each case, the choice depends only on the simplification 
that can be achieved. 


Combinational Circuit 


Combinational circuits are Logic circuits that perform arithmetic functions 
(e.g., addition, subtraction, multiplication and division). 

These circuits don’t have memory and the output depends only on inputs 
provided. Combinational circuit is a logic circuit containing only logic gates. 


Key PONTS n-ne - 


+ Logic gates are not always required because simple logic functions can be 
performed with switches or diodes. 
(a) Switches in series (AND function) (b) Switches in parallel (OR function) 


(c) Combining IC outputs with diodes (OR function) 


Basic Logic Gates 
NOT Gate 


e One input, one output 

e Whatever logical state (1, 0) is applied to the input the opposite state 
(0,1 respectively) will appear at output. 

e Also known as inverter. 


X=A' Both are same 
A > X=A Jo representation 


Truth Table of NOT Gate 













If it is high or 
true, then output 
will be low or false 





Input A Output X 


0 1 
1 0 











OR Gate 


The output will be high / true / 1, if any or all of its inputs are high / true / 1. 
The output will be low / zero / false only if all of inputs are low / zero / false. 


A 
X=A+B 
B 





Truth Table for OR Gate 
Input A Input B Output X = (A; B) 
0 0 


0 
0 1 1 
1 0 1 
1 1 1 
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AND Gate 

The output will be high / true / 1, if all of its 
inputs are high / true / 1. If any of its inputs is 
low / false / 0, then output will be low / false / 0. 


37/1 


A 
X=A.B 
B 


Turth Table for AND Gate 








Input A Input B Output X = (A- B) 
0 0 0 
0 1 0 
1 0 0 
1 1 1 











| To change the type of gate, such as changing OR to AND, you must do three things 
= Invert (NOT) each input. = Change the gate type (OR to AND or AND to OR) 


= Invert (NOT) the output. 


Universal Gates 


Any function can be implemented with the help of these (NAND, NOR) gates. 


NAND Gate 


This is an AND gate with the output inverted or we can say (AND + NOT). 
Therefore, the output expression of the two input NAND gate is X = (A . BY. 


A X= (A.B À X= (A.B 
= Br => = .B)' 
É (A.B) E (A.B) 


x = (NOT(A AND B)) 


Truth Table of NAND Gate 





Input A Input B Output X = (A - BY 
0 0 1 
0 1 1 
1 0 1 
1 1 0 











NOR Gate 


This is an OR gate with the output inverted or we can say (OR + NOT). 


Therefore, the output expression of the two input NOR gate is X = (A + BY. 


X = (NOT (A ORB)) 


A A 
i X= (A+B) => 


X = (A+B)! 
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Truth Table of NOR Gate 
Input A Input B Output X = (A+ BY 
0 0 1 
0 1 0 
1 0 0 
1 1 0 
Other Types of Gates 


Exclusive OR (EX-OR / XOR) Gate 


The XOR gate provides 1 as an output only if A 

its two inputs are different. If the inputs are <)> en08 
same, the output will be a ‘0’. Unlike _ 
standard OR / NOR and AND / NAND *=4A@B=A B+A-B 
functions the XOR function always has 


exactly two inputs. 
X=A@B=A-BOA-B 
If A and B are different, then output will be high. 
Truth Table of XOR Gate 




















Input A Input B Output X = AB + AB 
0 0 0 
0 1 
1 0 1 
1 1 0 








Exclusive NOR (EX- NOR / XNOR) Gate 


The XNOR gate produces output 1 only ifthe A 
inputs are same. If the inputs are different the pg X=AOB 
output will be a zero 0. a 
= X=AQB=A-Bt+A-B 
X=A-B=A-B+A-B 





If A and B are different, then the output will be low or zero. 
Truth Table of XNOR Gate 








Input A Input B x S 
0 0 1 

















0 1 
1 0 
1 1 


=- 00O 
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Substituting one Type of Gate for Another 


Any Logic Gate can be built from NAND or 
NOR Gates 


NAND or NOR gates can be combined to create any type of gate. This 
enables a circuit to be bulit from just one type of gate, either NAND or NOR. 
e.g., an AND gate is a NAND gate, then a NOT gate (to undo the inverting 
function). 


-Key Points +3 


+ AND and OR gate can't be used to create other gates because they lack the 
inverting (NOT) function. 


e.g., an OR gate can be built from NOTed inputs fed into a NAND ( AND 
+ NOT) gate. 


Chart for NAND Equivalents 
Gate Equivalent in NAND Gates 


























Pe <> 
NOR > 1 
A 3 4 
=p 
2 Here we have 
y used 4 NAND 
gates 


4 


T> 





J 














D- 
Le 
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Combinational Circuit for 
Arithmetic Operations 


Half Adder 


Addition of 2 binary digits requires 2 inputs and 2 outputs, one for result 
and one for carry. 


Combinational circuit for result is 


De 


and for carry, combinational circuit is 


X 
x-y 
y 


combining both we will get the below circuit 


x 
Sum =x@ y 
y 


Carry =x- y 


Half adder logic diagram 


Truth Table for Half Adder 




















Input A Output 
x y S = x+ y (Result) | C = x- y (Carry) 
0 0 0 0 
0 1 1 0 
1 0 1 0 
1 1 0 1 
Full Adder 


A full adder is a combinational circuit that forms the arithmetic sum of 
3 input bits. It consists of 3 inputs and 2 outputs. Two of the input variables 
are the 2 significant bits to be added. The 3 rd input represents the carry 
from previous lower significant position. 
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) >— Sum = S=(x © y)®z 


z (xy) 








Carry =x-y 





Carry = C=(x @ y)z+xy 


Full adder logic diagram 
S=ZO(x@y) =Z (Xy +x’ y)+zZz(xy +x yy 
S=2Z' (xy +x y)+z(xy+x y) 
S = Xy Z + X’ yZ + XyZ+ X yz 
and the carry output is C = z (xy’ + x’ y) + XY = xy’ Z+ x’ yZ + xy 
Positive and Negative Logic 


= If the signal that activates the circuit (the 1 state) has a voltage level that is more 
positive that the 0 state, then the logic polarity is considered to be positive. Thus, in 
positive logic 1 is considered as high value and 0 is considered as low value. 


= If the signal that activates the circuit (the 1 state) has a voltage level that is more 
negative than the 0 state, then the logic polarity is considered to be negative. Thus 
in negative logic 1 is considered as low value and 0 is considered as high value. 


Decoders 


A decoder is a combinational circuit that converts binary information from n 
input lines of a maximum of 2” unique output lines. If the n bit decoded 
information has unused or don’t care combinations, the decoded output 
will have less than 2” outputs. The decoders are also called as n to m line 
decoders where m <2”. 


Truth Table of 3 Bit Binary to Decimal Decoder 








Input Output Minterm 

x y Z 

0 0 0 Dy =0 XYZ 
0 0 1 D =1 XV Z 
0 1 0 D, =2 xyz’ 
0 1 1 Dz, =3 Xx’ yz 

il 0 0 D,=4 xy’ z’ 

1 0 1 D; =5 XYZ 

1 1 0 Dg =6 xyz’ 

1 1 1 D,=7 XYZ 
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This decoder takes binary values as input and produces decimal value at 
output. Suppose, if D} is high it means the binary combination of 3 that is 
011, means x = 0, y=1andz=1 


x x y y z zZ 
























































Combinational circuit for 3 x 8 decoder 


Do 
D1 
D2 
3x8 D3 
Y— Decoder Dy 
Ds 
De 
D7 











IC presentation of decoder 
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Encoder 

An encoder is a digital function that produces a reverse operations from 
that of a decoder. An encoder has 2” (or less) input lines and n output lines. 
The output lines generate the binary code for 2” input variables. 









































Input Output 
Do D; D, Dz D, D; De D; xX y 
1 0 0 0 0 0 0 0 0 0 0 
0 1 0 0 0 0 0 0 0 0 1 
0 0 1 0 0 0 0 0 0 1 0 
0 0 0 1 0 0 0 0 0 1 1 
0 0 0 0 1 0 0 0 1 0 0 
0 0 0 0 0 1 0 0 1 0 1 
0 o 0 0 0 0 d 0 1 i 0 
0 o 0 0 0 0 0 1 1 1 1 



































From the above truth table, we get 
x =D, + D; +Dg+Dz7 
y=D,+D3+Dg+Dz7 
z=D,+D3;+D;+D, 




















Encoder 
8x3 


























P Gi D4+Ds5+Dgt+D7 








y = D2+D3+D6+D7 











z= D,+D3+Ds5+D7 











Combinational circuit for 8 x 3 encoder 
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Multiplexer 


Multiplexing means transmitting a large number of information units over a 
smaller number of channels or lines. A digital multiplexer is a combinational 
circuit that selects binary information from one of many input lines and 
directs it to the output line. The selection of a particular input line is 
controlled by a set of selection line. Normally, there are 2” lines and 
n selection lines whose bit combinations determine which input is selected. 


Selection Lines For Input Combinations 








S, S, Output will depend on 

0 0 h 4x1 

0 1 i, Multiplexer Output 
1 0 I 

1 1 I, 











S2 Sy 
The Selection of Inputs will be as of the above Table 
When S4 = 0 and Sọ = 0. Output of gate 2, 3 and 4 will be 0, and the 
output of gate 1 will depend upon /p (because Sọ = 1 and S4 = 1, thus 
output of gate 1= J, -1-1= l) 














Combinational circuit for 4 x 1 multiplexer 
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Demultiplexer 
A demultiplexer is a circuit that receives information on a single line and 
transmit that information on one of 2” possible output lines. The output line 
to which input will be transmitted is selected by the selection line. 

Input S, S; So So 
























































O1 
Oo|__ 
O> Input | 4x1 91/— Output 
Demux O2 — 
03 — 
K | 
So S4 
Logic diagram of 1x4 demultiplexer Block diagram of 1x4 demultiplexer 


Sequential Circuit 


A switching circuit whose output depends not only on the present state of 
its input but also what its input conditions have been in the past. 









Output 
Memory 
element 


Sequential circuit block diagram 


Input 





Combinational 
circuit 























Sequential logic output depends on stored levels and also the input 
levels. The memory elements are capable of storing binary information. 
A sequential circuit is specified by a time, sequence of inputs, output 
and internal states. 


-Key Points ee 


+ There are two types of sequential circuits. This classification depends on the 
timing of their signals. 

+ Asynchronous sequential circuits 

+ Synchronous sequential circuits 
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Asynchronous Sequential Circuits 
This is a system whose outputs depend upon the order in which its input variable 
changes and can be affected at any instant of time. 

Synchronous Sequential Circuits 


These circuits have a clock signal as one of their inputs to force the change in 
output only at discrete instance of time. As state transition in such circuits occurs 
only when the clock value is either 0 or 1 or happen at rising or falling edge of the 
clock. 


The memory element used in the synchronous sequential circuits is called flip-flops. 


Flip-Flops 
These are the binary cells capable of storing one bit of information. A 


flip-flop circuit has two outputs, one for the normal value and one for the 
complement value of the bit stored in it. 


A flip-flop circuit can maintain a binary state indefinitely until directed by an 
input signal of switch states. 


A Basic Flip-Flop 


























A flip-flop circuit can be constructed from two NAND gates or two NOR gates. 
SR Flip Flop Truth Table R 
SN Input Output Q 
„No. s R Q Q’ 
1 0 0 Unchanged Unchanged 
2 0 1 0 1 
3 1 0 1 0 Q 
4 1 1 Undefined Undefined s 
Flip-flop using NOR gate 
For more understanding let us consider a case when S = 0 and R =1and 
the current output of both the gates are Q and Q’ respectively, these 


outputs are working as inputs for the gate 2 and gate 1, respectively. So, 
for the first cycle. 
R=1andQ’ will be input for gate 1 so 
Q=1+Q’= 1and 
S = Qand Q = 1will be input for gate 2 so 
Q’=(0+1/ =0 
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Now, thee output will again work as inputs for the gates. We will keep on 
putting these values again and again, until we are not getting the same 
output each time. 

Below diagram illustrates a basic flip flop circuit using NAND gate 











Truth Table For NAND Gate Flip-Flop S (Set) Q 
Input Output 

S R Q Q’ 

0 o0 X x 

0/1 0 1 

1 0 1 0 Q 
1 1 | Nochangeie,Q | No change ie» Q° R (Reset) 











. Flip-flop using NAND gate 

RS Flip - Flop 

An RS flip-flop is similar to an SR 
latch. It functions only when clock Q 
pulse is 1 or an active clock edge is 
there. CP 

In RS flip-flop S = 1 sets the next 
state value of Q to 1 and R = tresets Q 
the next value of Q to 0. s 


Truth Table For RS Flip-Flop RS flip-flop 
When Clock Pulse is 1 


Present state Next state 


of Q (Qn) of QQ,) 


a 


~n 
a 








Invalid 
1 Q=S+ R'Q 
0 K-map of RS flip-flop 
1 


Invalid 




















=. RF FP FE OOO fo 
=. = O O =. = O O 
For OF OF OS 
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D Flip-Flop 
e Modified of RS flip-flop. 
e D flip-flop has 2 inputs, data input (D) and clock pulse. 


e This will function only when clock pulse is 1 or when the appropriate pulse 
edge of the clock input is encountered. 








D flip-flop 


Truth Table of D Flip-Flop When Clock Pulse is 1 


Present state of Q,_,) D | Next State of QQ) Qna D 0 1 
n- 























0 0 0 ; 
0 1 1 
1 0 0 
1 1 1 1 1 
Q=D 
JK Flip-Flop K-map of D flip-flop 


e JK flip-flop is an extended version of the RS flip-flop. 


e It has 3 inputs . J, K and Clock Pulse (CP). The J input corresponds to 
S input and the K input corresponds to the R input. 





K 


J K flip-flop 
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Truth Table of JK Flip-Flop When Clock Pulse is 1 


















































Present state IIK Next State 

of Q (Qr-1) of QQ) 
0 0 | 0 0 
0 0 1 0 
0 1 0 1 
0 1 1 1 
i ee 1 Q =JQ'+K'Q 
1 0| 1 0 K-map of JK flip-flop 
1 1 0 1 
1 1 1 0 

T Flip-Flop 


e Also known as toggle flip-flop. 
e Frequently used in building counters. 
e |thas 2 inputs, T and Clock Pulse (CP). 


e WhenT = 1, the flip-flop changes state after the active edge of the clock. If 
Q,-1 = 0 (present state of Q). The next state value of Q, will be set to 1. If 
Q,-1= 1, the next state value of Q will be reset to 0. When T = 0, no state 
change occurs. 


Truth Table of Flip-Flop When Clock Pulse is 1 








Present state of T 
Q,-1) 

0 0 

0 1 

1 0 

i 1 

















K-map of T flip-flop 
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Master-Slave Flip-Flops 


Constructed from two separate flip-flops. 

One circuit serves as master and the other as a slave. 

Gates 1 through 4 are from master flip-flop and gates 5 through 8 are from 
slave. 

Master works on the positive edge of clock pulse and slave works on the 
negative edge of clock pulse. 

















Circuit diagram of master-slave flip-flop 


Counters 


Sequential circuit that goes through a prescribed sequence of states 
upon the application of input pulses is called a counter. 


In a counter, the sequence of states may follow a binary count or any other 
sequence of states. 


Counters are used for counting the number of occurrences of an event 
and are useful for generating timing sequences to control operations in a 
digital system. 

A counter that follows the binary sequence is called a binary counter. 

An n-bit binary counter consists of n flip-flop and can count in binary from 
0to2™!, 

Up counters increasers in value and down counters decreases in value. 
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Asynchronous Counter (Ripple Counter) 


In ripple counter, the flip-flop output transition serve as a source for 
triggering other flip-flop. e.g., in a binary ripple counter consists of a series 
connection of complementing flip-flops (T or JK), with the output of each 
flip flop connected to the CP input of next higher order flip-flop. 

As) 






































4 bit binary ripple counter 


Synchronous Counter 


Synchronous counter is distinguished from ripple counter such that in 
synchronous counter clock pulses are applied to the CP inputs of all 
flip-flop simultaneously, rather than one at a time in succession as in ripple 
counter. 


The decision whether flip-flop is to be complemented or not is determined 
from the value of J and K inputs at the time of the pulse. 


Registers 
e A register consists of a group of flip-flops with a common clock input. 
e Register are commonly used to store and shift binary data. 


-Key Ponts see 


+ An n bits register has a group of n flip-flop which can store a binary 
information of n bits. 


Shift Register 


It is a register in which binary data can be stored and then shifted left or 
right when a clock pulse is applied. Bits shifted out one end of the register 
may be either lost or shifted back in on the other end. 
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Machine Instructions 


Computer Instruction 
A binary code used for specifying micro operations for computer. 


Instruction Code Group of bits used to instruct the CPU to perform an 
specific operation. 


Instruction Set Collection of instructions. 


Instruction Representation Each instruction has a unique bit pattern, but for 
human beings a corresponding symbolic representation has been defined. 


e.g., ADD, SUB, LOAD, etc. 














Operation code This refers to the operation to be 
(OPCODE) performed e.g., Add, Subtract, Multiply. 

Source operand Location where source operand can be 
reference found. 

Result operand Location where result is to be put. 
reference 

Next instruction Location where reference to the next 
reference instruction can be found. 





Subparts of instruction 


Instruction Cycles 

Instruction cycle consists of following phases 
e Fetching an instruction from memory. 

e Encoding the instruction. 


e Reading the effective address from memory in case of the instruction 
having an indirect address. 
e Execution of the instruction. 


Instruction Format 


An instruction consists of bits and these bits are grouped up to make fields. 
Some fields in instruction format are as follows 


1. Opcode which tells about the operation to be performed. 


2. Address field designating a memory address or a processor 
register. 


3. Mode field specifying the way the operand or effective address is 
determined. 
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Different types of Instruction formats 
Some common types are as given below 

e Three address instruction format 

e Two address instruction format 

e One address instruction format 

e Zero address instruction format 


Three Address Instruction Format 
e This system contains three address fields (address of operand 1, address 
of operand 2 and address where result needs to be put). 
e The address of next instruction is held in a CPU register called Program 
Counter (PC). 
Add | Result address | OP1 address | OP2 address 
Bits 8 24 24 24 





Here, the number of bytes required to encode an instruction is 10 bytes 
ie., each address requires 24 bit = 3 bytes. Since, there are three 
addresses and one opcode field so, 

3x 3+1=10 bytes. 
The number of memory access required is 7 words, /.e., 4 words for 
instruction fetch, 2 words for operand fetch and 1 word for result to be 
placed back in memory. 


Two Address Instruction Format 
e In this format, two addresses and an operation field is there. 


e The result is stored in either of the operand address /.e., either in address 
of first operand or in the address of second operand. 


e CPU register called Program Counter (PC) contains the address of next 
instruction. 


Op Code 
4 
Add | Result address OP1 address 
Bits 8 24 24 
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One Address Instruction Format 


One address field and an operation field. 

This address is of the first operand. 

The second operand and the result are stored in a CPU register called 
Accumulator Register (AR). Since, amachine has only one accumulator, it 
needs not be explicitly mentioned in the instruction. 

A CPU register (i.e., Program Counter (PC) holds the address of next 
instruction. 

In this scenario, two extra instructions are required to load and store the 
accumulator contents. 








Op Code 
L 
Add OP1 address 
Bits 8 24 





Number of bits required to encode an instruction is 4 bytes. i.e., each 
address requires 24 bits = 3 bytes. Since, there are one address and one 
operation code field, 1* 3 + 1= 4 bytes. 

The number of memory access required is 3 words j.e., 2 words for 
instruction fetch +1 word for code for operand fetch. 





Key POINTS anna 


+ Total number of bytes to encode an instruction = number of address fields * 
Bytes required to store an address + bytes required to store operation code. 
2*3+1=7 bytes. 
+ The number of memory access required is 6 words i.e., 3 words for 
instruction fetch +2 words for operand fetch +1 word for result to be placed 
back in memory. 


Zero Address Instruction Format 


Here, an stack is included in the CPU for performing arithmetics and logic 
instructions with no addresses. 

The operands are pushed onto the stack from memory and ALU 
operations are implicitly performed on the top elements of the stack. 
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e The address of the next instruction is held in a CPU register called 
program counter. 


Op Code 
— 
Add 
Bits 8 
e.g., Add 
Top of stack < Top of stack + second top of stack. 


Addressing Modes 


Addressing modes are the ways how architectures specify the address of 
an operand of an instruction. There are various addressing modes 


Implied Mode 
In this mode the operands are specified implicitly in the definition of an 
instruction. 


Immediate Mode 
In this mode the operand is specified in the instruction itself or we can say 
that, an immediate mode instruction has an operand rather than an address. 


Register Mode 
In this mode, the operands are in registers. 


Direct Address Mode 


It this mode, the address of the memory location that holds the operand is 
included in the instruction. The effective address is the address part of the 
instruction. 


Indirect Address Mode 


In this mode the address field of the instruction gives the address where 
the effective address is stored in memory. 


Relative Address Mode 


In this mode the content of program counter is added to the address part 
of the instruction to calculate the effective address. 


Indexed Address Mode 
In this mode, the effective address will be calculated as the addition of the 
content of index register and the address part of the instruction. 
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Data Transfer Instructions 


Data transfer instructions cause transfer of data from one location to 
another without changing the information content. 


The common transfers may be between memory and processor registers, 
between processor registers and input/output. 


Typical Data Transfer Instructions 











Name Mnemonic 
LOAD LD 
STORE ST 
MOVE MOV 
EXCHANGE XCH 
INPUT IN 
OUTPUT OUT 
PUSH PUSH 
POP POP 


Data Manipulation Instructions 


Data manipulation instructions perform operations on data and provide the 
computational capabilities for the computer. 


There are three types of data manipulation instructions. 
1. Arithmetic instructions 
2. Logical and bit manipulation instructions 
3. Shift instructions. 


Typical Arithmetic Instructions 











Name Mnemonic 
INCREMENT INC 
DECREMENT DEC 

ADD ADD 

SUBTRACT SUB 
MULTIPLY MUL 
DIVIDE DIV 
ADD WITH CARRY ADDC 
SUBTRACT WITH SUBB 
BORROW 
NEGATIVE NEG 
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Typical Logical and Bit Mani 


39] 


pulation Instructions 











Name Mnemonic 
CLEAR CLR 
COMPLEMENT COM 
AND AND 
OR OR 
EXCLUSIVE-OR XOR 
CLEAR CARRY CLRC 
SET CARRY SETC 
COMPLEMENT CARRY COMC 
ENABLE INTERRUPT EI 
DISABLE INTERRUPT DI 


Typical Shift Instructions 





Name Mnemonic 
LOGICAL SHIFT SHR 
RIGHT 
LOGICAL SHIFT LEFT SHL 
ROTATE RIGHT ROR 
ROTATE LEFT ROL 








Program Control Instructions 


e Program control instructions specify co 


nditions for altering the content of 


the program counter, while data transfer and manipulation instructions 
specify conditions for data processing operations. 


e The change in value of a program counter as a result of the execution of a 


program control instruction causes a br 
execution. 


eak in the sequence of instruction 


Typical Program Control Instructions 





Name Mnemonic 
BRANCH BR 
JUMP JMP 
SKIP SKP 
CALL CALL 
RETURN RET 
COMPARE CMP 
TEST TST 
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Program Interrupt 


The program interrupts are used to handle a variety of problems that arise 
out of normal program sequence. 

Program interrupts are used to transfer the program control from a 
currently running program to another service program as a result of an 
external or internal generated request. Control returns to the original 
program after the service program is executed. 


Types of Interrupts 
There are three major types of interrupts 


de 
3. 


External interrupt 2. Internal interrupt 

Software interrupt 

External interrupts come from Input-Output (I/O) devices or from a timing 
device. 

Internal interrupts arise from illegal or erroneous use of an instruction or 
data. 


External and internal interrupts from signals that occur in the hardware of 
the CPU. 
A Software interrupt is initiated by executing an instruction. 


Complex Instruction Set Computer (CISC) 


Computer architecture is described as the design of the instruction set for 
the processor. 

The computer with a large number of instructions is classified as a 
complex instruction set computer. The CISC processors typically have 
the 100 to 250 instructions. 

The instructions in a typical CISC processor provide direct manipulation 
of operands residing in memory. 

As more instructions and addressing modes are incorporated into a 
computer, the more hardware logic is needed to implement and support 
them and this may cause the computations to slow down. 


Reduced Instruction Set Computer (RISC) 


RISC architecture is used to reduce the execution time by simplifying the 
instruction set of the computer. 

In the RISC processors, there are relatively few instructions and few 
addressing modes. In RISC processors, all operations are done within the 
registers of the CPU. 
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Design of Control Unit 

e The function of the control unit in a digital computer is to initiate 
sequences of micro operations (the operations executed on data stored 
in registers are called micro operations). 

e The number of different types of micro operations that are available in a 
given system is finite. 


e When the control signals are Register set 
generated by hardware using | Control 
conventional logic design unit 
techni the control unit is Arithmetic 
SCORE Sy MS COMMO Logic Unit (ALU) 
said to be hardwired. 

Block diagram of a control unit 


e Microprogramming is a second 
alternative for designing the control unit of a digital computer. 








-Key PO Se ienas 


+ The control unit initiates a series of sequential steps of micro operations. 
During any given time certain micro operations are to be initiated, while 
others remain idle. 

+ The control variables at any given time can be represented by a string of 1 s 
and 0 s called a control word. 


Peripheral Devices 


e The l/O system provides an efficient mode of communication between the 
central system and the outside environment. 

e Programs and data must be entered into computer memory for 
processing and results obtained from computations must be displayed 
for the user. The most familiar means of entering information into a 
computer is through a type writer-like keyboard. On the other hand the 
central processing unit is an extremely fast device capable of performing 
operations at very high speed. 

e To use a computer efficiently, a large amount of programs and data must 
be prepared in advance and transmitted into a storage medium such as 
magnetic tapes or disks. The information in the disk is then transferred 
into a high-speed storage, such as disks. 

e Input or output devices attached to the computer are called the peripheral 
devices. The most common peripherals are keyboards, display units and 
printers. Peripherals that provide auxiliary storage for the system are 
magnetic disks and tapes. 
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Input-Output Interface 


Input-Output interface provides a method for transferring information 
between internal storage and external I/O devices. 

Peripherals are connected to the central processing unit with a special 
communication links (I/O bus). 

The I/O bus from processor is attached to all peripheral interfaces. 


T/O Communication i 

= There is a need of I/O bus for communication between CPU and peripheral devices 
because of many reasons 
(a) Data formats of internal memory of CPU and the peripheral devices (I/O devices) | 
are different. 

(b) Data transfer rates CPU and the I/O devices are different. i 


Asynchronous Data Transfer 





The two units such as CPU Asynchronous data 
and I/O interface, are transfer approach 


designed independently of rm 
each other. If the registers in 


the interface does not have —— 
Classification of asynchronous data transfer 
a common clock (global 
approach 


clock) with the CPU registers, 

then the transfer between the two units is said to be asynchronous. 

The asynchronous data transfer requires the control signals that are being 
transmitted between the communicating units to indicate the time at 
which data is being transmitted. 











Strobe Control 


Strobe is a pulse signal supplied by one unit to another unit to indicate the 
time at which data is being transmitted. 


Source |Data bus, Destination Source Data bus Destination 
unit | Strobe | unit unit Strobe unit 
Block diagram Block diagram 


Data |«Valid data> Data__|<-Valid data > 
Strobe |<\alid data> Strobe |< Valid data> 


Timing diagram Timing diagram 











Source initiated strobe for data Destination initiated for data transfer 
transfer 
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e Strobe may be activated by either the source or the destination unit. 

e The strobe pulse is controlled by the clock pulses in the CPU. The data 
bus carries the binary information from source unit to the destination unit. 
In source initiated strobe for data transfer, the strobe is a single line that 
informs the destination unit when a valid data word is available in the bus. 

e But in destination initiated for data transfer it informs the source to provide 
the data. Then source unit places the data on the data bus. 


Handshaking 

e The disadvantage of the strobe method is that the source unit has no 
information whether the destination unit has actually received the data 
item, if the source unit initiates the transfer. But if the destination unit 
initiates the transfer it has no way of knowing whether the source unit has 
actually placed the data on the bus. The handshake method solves this 
problem. 

e The basic approach of handshaking is as follows. In handshaking 
method, there are two control signals unlike strobe control method. One 
control signal is in the same direction as the data flow in the bus from the 
source to the destination. This signal is used to inform the destination unit 
whether there are valid data in the bus. The second control signal is in the 
other direction from the destination to the source. It is used to inform the 
source whether it can accept data. 


Data bus 


Source Data valid 
unit | Data accepted 





Block diagram of handshaking 


Timing diagram 





Synchronous Data Transfer 


In synchronous data transfer a global or shared clock is provided to both 
sender and receiver. The sender and receiver works simultaneously. 
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Modes of Transfer 


The information from external device is stored in memory. Information 
transferred from the central computer into an external device via memory 
unit. Hence, this data transfer between the central computer and |/O 
devices is handled in various modes. 

1. Programmed |/O 2. Interrupt- initiated I/O 

3. Direct Memory Access (DMA) 


Programmed I/O 


In this mode, each data item is transferred by an instruction in the program. 
The transfer is to and from a CPU register and peripherals. In the 
programmed I/O method, the CPU stays in a program loop until the I/O unit 
indicates that it is ready for data transfer. Once the data transfer is initiated, 
the CPU is required to monitor the interface to see, when the transfer can 
again be made. This is a time-consuming process since, it keeps the CPU 
busy needlessly. 


Interrupt-initiated I/O 


This mode removes the drawback of the programmed |/O mode. In this 
mode, interrupt facility is used to inform the interface to issue an interrupt 
request signal when the data are available from the device. In the mean 
time the CPU can proceed to execute another program. 


Direct Memory Access (DMA) 


In programmed I/O mode, the transfer is between CPU and peripherals. 
But in direct memory access mode, the interface, transfers data into and 
out of the memory unit through the memory bus. The CPU initiates the 
transfer by supplying the interface with the starting address and the 
number of words needed to be transferred and then proceeds to execute 
other tasks. When the transfer is made, the DMA requests memory cycles 
through the memory bus. When the request is granted by the memory 
controller, the DMA transfers the data directly into memory. 


The Bus Request (BR) input is used by the DMA controller to request the 
CPU to get the control of buses. When this input is active, the CPU 
terminates the execution of the current instruction and places the address 
bus and the data bus. The CPU activates the Bus Grant (BG) output to 
inform the external DMA that the buses are available. The DMA now takes 
the control of the buses to conduct the memory transfer. When DMA 
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terminates the transfer, it disables the bus request line. The CPU disables 
the bus grant, takes the control of the buses. 


A B U S |<—— Address bus 
D B U S }—— Data bus 

R D > Read 

W R > Write 


Bus Request 


Bus Grant 








CPU bus signals for DMA transfer 


Parallel Processing 


Parallel processing provides simultaneous data processing tasks for the 
purpose of increasing the computational speed of a computer system 
rather than each instruction is processed sequentially, a parallel 
processing system is able to perform concurrent data processing to 
achieve faster execution time and increase throughput. 


There are more advantages with parallel processing but it has some issues 
also. Due to parallel processing, the amount of hardware increases and the 
cost of system increases. Parallel processing is established by distributing 
the data among the multiple functional units. 


Flynn’s Classification 
e MJ Flynn introduced the parallel processing classification. 


e This classification considers the organisation of a computer system by 
the number of instructions and data items that are manipulated 
simultaneously. 


-Key POINTS iiini 


+ The sequence of instructions read from the memory constitutes an instruction 
stream. 

+ The operations performed on the data in the processor constitutes a data 
stream. 


Flynn’s Classification 
Flynn’s classification divides computer into four major groups as follows 
= Single Instruction stream, Single Data stream (SISD) 
= Single Instruction stream, Multiple Data stream (SIMD) 
= Multiple Instruction stream, Single Data stream (MISD) 
= Multiple Instruction stream, Multiple Data stream (MIMD) 
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SISD 


It represents the organisation of a single computer containing a control 
unit, a processor unit and a memory unit. Instructions are executed 
sequentially. 





[CU} + [PU] > [Mi] 
tooo 
IS 
SISD computer 
SIMD 


It represents an organisation that includes many processing units under 
the supervision of a common control unit. All processors receive the same 
instruction from the control unit but operate on different items of data. 


— [Pu a ot! 
DS 

L| PU, }«——2 >! 
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Shared Memory 


SIMD computer 


MISD 


lts architecture contains n processors unit, each receiving instruction 
streams and providing the same data stream. MISD structure is only of 
theoretical interest, since no practical system has been constructed using 
this organisation. 
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MIMD 


Its organisation refers to a computer system capable of processing several 
programs at the same time. 
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Pipelining 

Pipeline processing is an implementation technique, where arithmetic 
suboperations or the phases of a computer instruction cycle overlap in 
execution. A pipeline can be visualised as a collection of processing 
segments through which information flows. 

The overlapping of computation is made possible by associating a register 
with each segment in the pipeline. The registers provide isolation between 
each segment. 


General Structure of 3-Segment Pipeline 
i Clock 
| ł ł | : 


—| Sı ir Sz > R2 sı Hir 









































e Each segment consists of a combinational circuit S; that performs a 
suboperation over the data stream flowing through pipe. The segments 
are separated by register R; that hold the intermediate results between 
the stages. 

e Information flows between adjacent stages under the control of a 
common clock applied to all the registers simultaneously. 

e The behaviour of a pipeline can be illustrated with a space-time diagram. 
This is a diagram that shows the segment utilisation as a function of time. 
The horizontal axis displays the time in clock cycles and the vertical axis 
gives the segment number. 
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The space-time diagram shows the four segment pipeline with 7, through 
Tę six tasks executed. 
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Technical Description 


Consider if K- segment pipeline with clock cycle timet, is used to execute 
n tasks. The first task T; requires a time = Kt,. 
The remaining (n — 1) tasks emerge from the pipe at the rate of one task 
per clock cycle and they will be completed after a time =(n-1)tp. 
Therefore, to complete n tasks using a K-segment pipeline requires 
K +(n—1) clock cycles. 
A non-pipeline unit perform the same operation and takes a time oft, to 
complete each task. The total time required for n tasks in nt,,. 
The speedup (S) is the ratio of a pipeline processing over an equivalent 
non-pipeline processing. 

nt, 
(K+0-Tt, 


Special Case in Speedup 


As number of tasks increases, n becomes larger than K-1, 
then K+n-—1 is approximately n. Then, speedup becomes 


şai ! 
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Instruction Pipeline 


Pipeline processing can occur not only in the data stream but in the 
instruction stream. 


e An instruction pipeline reads consecutive instructions from memory while 


previous instructions are being executed in other segments. 
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e This causes the instruction fetch and execute phases to overlap and 
perform simultaneous operations and consider a computer with a 
instruction fetch unit and an instruction execution unit designed to provide 
two-segment pipeline. 

e Computers with complex instructions requires other phases in addition to 
fetch and execute to process an instruction completely. 

The instructions cycle is as follows. 

e Fetch the instruction from memory ° Decode the instruction 

e Calculate the effective address e Fetch the operands from memory 

e Execute the instruction e Store the result in the proper place. 


Difficulties in Instruction Pipeline 


1. Resource conflicts It is caused by access to memory by two segments 
at the same time. Most of these conflicts can be solved by using 
separate instruction and data memories. 


2. Data dependency conflicts It arises when an instruction depends on 
the result of a previous instruction but this result is not yet available. 

3. Branch difficulties arise It arises from branch and other instructions 
that change the value of PC. 


-Key POINGS asiiuiisesssieissisnsnstsnssii si nsns 
+ The memory unit that communicates directly with the CPU is called the Main 
memory. 


+ Devices that provide backup storage are called auxiliary memory. i.e., 
magnetic tapes or magnetic disks. 


Memory Hierarchy 

e The memory unit is used for storing programs and data. It fulfills the need 
of storage of the information. 

e The additional storage with main memory capacity enhance the 
performance of the general purpose computers and make them efficient. 

e Only those programs and data, which is currently needed by the 
processor, reside in main memory. Information can be transferred from 
auxiliary memory to main memory when needed. 
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Cache Memory 


e A small, fast storage memory used to improve average access time. 
Or 

e We can say that cache is a very high-speed memory that is used to 
increase the speed of processing by making current programs and data 
available to the CPU at a rapid rate. 

e The cache is used for storing segments of programs currently being 
executed in the CPU and temporary data frequently needed in the present 
calculations. 


e E=] c Main 
1/0 memory 
Magnetic processor 
disks ‘ > 
Cache 
CPU <——>| memory 


Memory connection in computer system 























Cache Performance 
When the processor needs to read or write to a location in main memory, it 
first checks whether a copy of that data is in the cache. If so, the processor 
immediately reads from or writes to the cache. 
Cache hit If the processor immediately reads or writes the data in the 
cache line. 
Cache miss If the processor does not found the required word in cache, 
then cache miss has occured. 
Hit ratio Percentage of memory accesses satisfied by the cache. 

Miss ratio =1- Hit ratio 


Key POINTS aaa ninina inna 


+ The cache memory is employed in computer systems to compensate for the 
speed differential between main memory access time and processor logic. 
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Main Memory 


The main memory refers to the physical memory and it is the central 
storage unit in a computer system. 

The main memory is relatively large and fast memory used to store 
programs and data during the computer operation. 

The main memory in a general purpose computer is made up of RAM 
integrated circuit. 


Latency 


The latency is the time taken to transfer a block of data either from main 
memory or caches. 


As the CPU executes instructions, both the instructions themselves and 
the data they operate on must be brought into the registers, until the 
instruction/data is available, the CPU cannot proceed to execute it and 
must wait. The latency is thus the time the CPU waits to obtain the data. 
The latency of the main memory directly influences the efficiency of the 
CPU. 


Auxiliary Memory 


The common auxiliary memory devices used in computer systems are 
magnetic disks and tapes. 


Magnetic Disks 


A magnetic disk, is a circular plate constructed of metal or plastic coated 
with magnetised material. 

Often, both sides of the disk are used and several disks may be stacked 
on one spindle with read/write heads available on each surface. 

All disks rotate together at high speed. Bits are stored in the magnetised 
surface in spots along concentric circles called tracks. The tracks are 
commonly divided into sections called sectors. 


Magnetic Tapes 


A magnetic tape is a medium of magnetic recording, made of a thin 
magnetisable coating on a long, narrow strip of plastic film. 

Bits are recorded as magnetic spots on the tape along several tracks. 
Magnetic tapes can be stopped, started to move forward or in reverse. 
Read/write heads are mounted one in each track, so that data can be 
recorded and read as a sequence of characters. 
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Mapping of Cache Memory 
= The transformation of data from main memory to cache memory is referred to as a 
mapping process. 
= There are three types of mapping procedures considered 
1. Associative mapping 2. Direct mapping 


3. Set associative mapping 


Associative Mapping 

e The fastest and most flexible cache organisation uses an associative 
memory. 

e The associative memory stores both the address and the data of the 
memory word. 





Address | Data 














e This memory permits to store any word in cache from main memory. 


Direct Mapping 

e Associative memories are expensive compared to Random Access 
Memories (RAM), because of the added logic associated with each cell. 

e The CPU address is divided into two fields 


k—n bits ————_> 


| 
(n-k) bits k bits 


Set-Associative Mapping 


In direct mapping, each word of cache can store two or more words of 
memory under the same index address. But in set-associative method, 
each data word is stored together with its tag and the number of tag data 
items in one word of cache is said to form a set. 


-Key Points ~e 


+ The number of bits in the index field is equal to the number of address bits 
required to access the cache memory. 

+ In general, if there are 2% words in cache memory and 2” words in main 
memory. Then, the n-bit memory address is divided into two fields k-bits for 
the index field and n-k bits for the tag field. 
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ALU 
ANSI 


API 
ARP 
ASCII 


ASP 


ATA 


ATM 
BIOS 
Blob 
BMP 
CAD 

CD 
CD-R 
CD-ROM 


CD-RW 
CDMA 


CGI 
CISC 


CLOB 
CMOS 


CMYK 
CPU 


Abbreviations 


Arithmetic Logic Unit 


American National Standards 
Institute 


Application Program Interface 
Address Resolution Protocol 
American Standard Code for 
Information Interchange 
Active Server Page or 
Application Service Provider 
Advanced Technology 
Attachment 





Asynchronous Transfer Mode 
Basic Input/Output System 
Binary Large Object 

Bitmap 

Computer-Aided Design 
Compact Disc 

Compact Disc Recordable 


Compact Disc Read-Only 
Memory 


Compact Disc Re-Writable 


Code Division Multiple 
Access 


Common Gateway Interface 


Complex Instruction Set 
Computing 
Character Large Object 





Complementary Metal Oxide 
Semiconductor 


Cyan Magenta Yellow Black 
Central Processing Unit 


CRM 


CRT 
CSS 
DBMS 


DCIM 
DDL 
DDR 
DFS 
DHCP 


DLL 
DMA 
DNS 
DOS 
DRAM 


DSL 
DTD 
DV 
DVD 
DVD-R 


DVD-RAM 


DVD-RW 


DVI 
DVR 
ECC 


Customer Relationship 
anagement 


Cathode Ray Tube 
Cascading Style Sheet 


Database Management 
System 


Digital Camera IMages 
Data Definition Language 
Double Data Rate 
Distributed File System 


Dynamic Host Configuration 
Protocol 


Dynamic Link Library 
Direct Memory Access 
Domain Name System 
Disk Operating System 
Dynamic Random Access 
emory 


gital Subscriber Line 
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gital Video Recorder 
Error Correction Code 
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EDI 
EXIF 


FDDI 


FIFO 
FTP 
Gbps 
GIF 
GIGO 
GIS 


GPS 
GUI 
HDD 
HDMI 


HDTV 
HDV 
HTML 
HTTP 
HTTPS 


I/O 
ICF 
ICMP 


IDE 


IEEE 


Electronic Data Interchange 
Exchangeable Image File 
Format 

Fiber Distributed Data 
Interface 

First In, First Out 

File Transfer Protocol 





Gigabits Per Second 
Graphics Interchange Format 
Garbage In, Garbage Out 


Geographic Information 
Systems 


Global Positioning System 
Graphical User Interface 
Hard Disk Drive 


igh-Definition Multimedia 
Interface 


T 


High Definition Televsion 
igh-Definition Video 
Hyper-Text Markup Language 
HyperText Transfer Protocol 
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HyperText Transport Protocol 
Secure 


nput/Output 
nternet Connection Firewall 


nternet Control Message 
Protocol 


ntegrated Device Electronics 
or Integrated Development 
Environment 


nstitute of Electrical and 
Electronics Engineers 


nstant Message 


nternet Message Access 
Protocol 


nternet Protocol 

nternetwork Packet Exchange 
ntegrated Services Digital 
etwork 

nternational Organization for 
Standardization 








ISP 


IVR 
JPEG 


JRE 
JSON 
JSP 
Kbps 
KVM 


LAN 
LCD 
LDAP 


LED 
LIFO 
MAC 


MANET 
Mbps 
MIDI 


MIPS 
MMS 


MPEG 
MTU 
NetBIOS 


NIC 
NTFS 
OLAP 
OLE 


OOP 


OSPF 
P2P 
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Internet Service Provider 
Information Technology 
Interactive Voice Response 


Joint Photographic Experts 
Group 
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JavaScript Object Notation 
Java Server Page 
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ouse Switch 
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Protocol 
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ultimedia Messaging 
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PC 
PCB 
PDF 
PHP 
PNG 
POP3 
PPP 
PPPOE 


PPTP 
PRAM 
PROM 


PS/2 
RAID 


RAM 
RFID 


RGB 
RISC 


ROM 
RPC 
RTE 
RTF 
SAN 
SATA 


SCSI 


SD 
SDK 
SDRAM 


SLA 


SMART 


Personal Computer 

Printed Circuit Board 
Portable Document Format 
Hypertext Preprocessor 
Portable Network Graphic 
Post Office Protocol 

Point to Point Protocol 


Point-to-Point Protocol Over 
Ethernet 


Point-to-Point Tunneling 
Protocol 


Parameter Random Access 
emory 


Programmable Read-Only 
emory 


Personal System/2 


Redundant Array of 
ndependent Disks 


Random Access Memory 


Radio-Frequency 
dentification 


Red Green Blue 
Reduced Instruction Set 
Computing 

Read-Only Memory 
Remote Procedure Call 
Runtime Environment 
Rich Text Format 





Storage Area Network 


Serial Advanced Technology 
Attachment 


Small Computer System 
Interface 


Secure Digital 
Software Development Kit 


Synchronous Dynamic 
Random Access Memory 
Software License or Service 
Level Agreement 
Self-Monitoring Analysis And 
Reporting Technology 


SMS 
SMTP 
SNMP 


SOA 
SOAP 


SQL 
SRAM 


SSD 
SSH 
SSID 
SSL 
TCP/IP 


TFT 
TIFF 
TTL 
UAT 
UDDI 


UDP 
UML 
UPnP 
UPS 
URI 
URL 
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UTF 


VCI 
VDU 
VFAT 
VGA 
VolP 
VPI 
VPN 
W3C 
WAN 
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Short Message Service 
Simple Mail Transfer Protocol 


Simple Network Management 
Protocol 


Service Oriented Architecture 


Simple Object Access 
Protocol 





Structured Query Language 


Static Random Access 
Memory 


Solid State Drive 
Secure Shell 

Service Set Identifier 
Secure Sockets Layer 


Transmission Control 
Protocol/Internet Protocol 


Thin-Film Transistor 
Tagged Image File Format 
Time To Live 

User Acceptance Testing 


Universal Description 
Discovery and Integration 


User Datagram Protocol 
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niversal Plug and Play 
ninterruptible Power Supply 

Uniform Resource Identifier 
niform Resource Locator 
niversal Serial Bus 
n 


U 


U 








icode Transformation 
ormat 


rtual Channel Identifier 
sual Display Unit 

rtual File Allocation Table 
deo Graphics Array 

oice Over Internet Protocol 
rtual Path Identifier 





F 
V 
VI 
VI 
VI 
Vi 
VI 
VI 





irtual Private Network 
World Wide Web Consortium 


Wide Area Network 
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Wi-Fi Wireless Fidelity XML Extensible Markup Language 

WWW World Wide Web XSLT Extensible Style Sheet 

XHTML Extensible Hypertext Markup Language Transformation 
Language 


Famous Scientists and their 
Discoveries 


e Wil Vander Aalst Business process management, process mining 

e Hal Abelson Intersection of computing and teaching 

e Serge Abiteboul Database theory 

e Samson Abramsky Game semantics 

° Leonard Adleman RSA, DNA computing 

- Frances E Allen Compiler optimization 

* Gene Amdahl Supercomputer developer, founder of Amdahl Corporation 


e Bruce Arden Programming language compilers (GAT, MAD), virtual memory 
architecture, MTS 


e John Vincent Atanasoff Computer pioneer 

e- Ali Aydar Computer scientist and CEO of Sporcle 

* Charles Babbage Invented first mechanical computer 

e Roland Carl Backhouse Mathematics of program construction 

e John Backus FORTRAN, Backus-Naur form, first complete compiler 
e Rudolf Bayer B-tree 

e Steven M Bellovin Network security 

e Tim Berners-Lee World Wide Web 

e Daniel J Bernstein Qmail, software as protected speech 

* Manuel Blum Cryptography 

* Barry Boehm Software engineering economics, spiral development 
e George Boole Boolean logic 

e Bert Bos Cascading Style Sheets 

* Jonathan Bowen Z notation, formal methods 

* David J Brown Unified memory architecture, binary compatibility 
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* Per Brinch Hansen Concurrency 

e Sjaak Brinkkemper Methodology of product software development 
e Tracy Camp Wireless computing 

e Vinton Cerf Internet, TCP/IP 

e Peter Chen Entity-relationship model, data modeling, conceptual model 
e Edgar F Codd Formulated the database relational model 

° Stephen Cook NP-completeness 

e James Cooley Fast Fourier Transform (FFT) 

e Seymour Cray Cray Research, supercomputer 

«+ Andries van Dam Computer graphics, hypertext 

e Christopher J Date Proponent of database relational model 


e Richard DeMillo Computer security, software engineering, educational 
technology 


e Dorothy E Denning Computer security 

e Vinod Dham P5 Pentium processor 

e Whitfield Diffie Public key cryptography, Diffie-Hellman key exchange, 
e Edsger Dijkstra Algorithms, Goto considered harmful, semaphore 

e Susan Dumais Information Retrieval 

* Brendan Eich JavaScript, Mozilla 

e Philip-Emeagwali Supercomputing 

° Douglas Engelbart Tiled windows, hypertext, computer mouse 


> Don Estridge Led development of original IBM Personal Computer (PC) known 
as father of the IBM PC 


e Oren Etzioni MetaCrawler, Netbot 

e David C Evans Computer graphics 

e Edward Felten Computer security 

e Tommy Flowers Colossus computer 
e Robert Floyd NP-completeness 

e Michael Garey NP-completeness 


< Seymour Ginsburg Formal languages, automata theory, AFL theory, 
database theory 


e Kurt Godel Computability - not a computer scientist per se, but his work was 
invaluable in the field 


- Adele Goldberg Smalltalk 
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e Ian Goldberg Cryptographer, off-the-record messaging 

+ Oded Goldreich Cryptography, computational complexity theory 
° Shafi Goldwasser Cryptography, computational complexity theory 
e Gene Golub - Matrix computation 

e Martin Charles Golumbic - Algorithmic graph theory 

e James Gosling Java 

e Paul Graham Viaweb, On Lisp, Arc 

* Susan L Graham Compilers, programming environments 

e Jim Gray Database 

* Sheila Greibach Greibach normal form, AFL theory 

e Ramanathan V Guha RDF, Netscape, RSS, Epinions 

e Neil J Gunther Computer performance analysis, capacity planning 
e Peter G Gyarmati Adaptivity in operating systems and networking 


e Richard Hamming Hamming code, founder of the Association for 
Computing Machinery 


* Juris Hartmanis Computational complexity theory 

e Martin Hellman Encryption 

e James Hendler Semantic Web 

e John L Hennessy Computer architecture 

e Danny Hillis Connection Machine 

e CAR Hoare Logic, rigor, Communicating sequential processes (CSP) 
e John Henry Holland Genetic algorithms 

+ John Hopcroft Compilers 

e David A Huffman Huffman coding, used in data compression. 


e Watts Humphrey Personal Software Process (PSP), Software quality, Team 
Software Process (TSP) 


e Ivar Jacobson Unified Modeling Language, Object Management Group 
* Cliff Jones Vienna Development Method (VDM) 

e Robert E Kahn TCP/IP 

e Avinash Kak Digital image processing 

e Richard Karp NP-completeness 

e Marek Karpinski NP optimization problems 

* Carl Kesselman Grid computing 
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e Stephen Cole Kleene Kleene closure, recursion theory 


* Leonard Kleinrock ARPANET, queueing theory, packet switching, 
hierarchical Routing 


e Andrew Koenig C++ 

e Andrey Nikolaevich Kolmogorov Algorithmic complexity theory 
* Robert Kowalski Logic programming 

* John Koza Genetic programming 

e Leslie Lamport Algorithms for distributed computing, LaTex 
e Manny M Lehman Laws of Software Evolution 

e Max Levchin Gausebeck-Lechin Test and PayPal 

e Leonid Levin Computational complexity theory 

e Richard J Lipton Computational complexity theory 

* Barbara Liskov Programming languages 

e Paul Mockapetris Domain Name System (DNS) 

e Cleve Moler Numerical analysis, MATLAB 

* Edward F Moore Moore machine 

e Gordon Moore Moore's law 

e Hans Moravec Robotics 

e Mark Overmars Game programming 

* David Parnas Information hiding, modular programming 

* Yale Patt Instruction-level parallelism, speculative architectures 
e David John Pearson CADES, computer graphics 

e Alan Perlis Programming Pearls 

e Radia Perlman Spanning tree protocol 

e Simon Peyton Jones Functional programming 

e William H Press Numerical algorithms 

e Michael O Rabin Nondeterministic machines 

. Dragomir R Radev Natural Language Processing, Information Retrieval 
- Brian Randell Dependability 

e Joyce K Reynolds Internet 

e Dennis Ritchie C (programming language), UNIX 

e Ron Rivest RSA, MD5, RC4 

e Colette Rolland REMORA methodology, meta modelling 
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* Douglas T Ross Structured Analysis and Design Technique 

e Winston W Royce Waterfall model 

e Rudy Rucker Mathematician, writer, educator 

+ James Rumbaugh Unified Modeling Language, Object Management Group 


e Carl Sassenrath Operating systems, programming languages, Amiga, 
REBOL 


e Mahadev Satyanarayanan File systems, distributed systems, mobile 
computing, pervasive computing 


< Ben Shneiderman Human-computer interaction, information visualization 
e Larry Stockmeyer Computational complexity, distributed computing 
e Michael Stonebraker Relational database practice and theory 


e Olaf Storaasli Finite element machine, linear algebra, high performance 
computing 


e Christopher Strachey Denotational semantics 

e Madhu Sudan Computational complexity theory, coding theory 

e Bert Sutherland Graphics, Internet 

e Andrew S Tanenbaum Operating systems, MINIX 

e Avie Tevanian Mach kernel team, NeXT, Mac OS X 

° Linus Torvalds Linux kernel, Git 

< Godfried Toussaint Computational geometry - computational music therory 
e Joseph F Traub Computational complexity of scientific problems 

* Murray Turoff Computer-mediated communication 


e Alan Turing British computing pioneer, Turing Machine, algorithms, 
cryptology, computer architecture. 


< Jeffrey D Ullman Compilers, databases, complexity theory 


* Leslie Valiant Computational complexity theory, computational learning 
theory 


- David Wagner Security, cryptography 
«+ Manfred K Warmuth Computational learning theory 
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Some Recent Inventions 

























































































Equipment Name Created By 
The Stark Hand Mark Stark 
The PrintBrush Alex Breton 
Dynamic Eye Sunglasses Chris Mullin 
The Bed Bug Detective Chris Goggin 
The Medical Mirror Ming-Zher Poh 
Intel 805xx Processor Series 
Product Code Marketing Name (s) Code Name (s) 
80500 Pentium P5 (A-step) 
80501 Pentium P5 
80502 Pentium P54C, P54CS 
80503 Pentium with MMX Technology | P55C, Tillamook 
80521 Pentium Pro P6 
80522 Pentium II Klamath 
80523 Pentium II, Celeron, Pentium II | Deschutes, Covington, Drake 
Xeon 
80524 Pentium II, Celeron Dixon, Mendocino 
80525 Pentium III, Pentium III Xeon Katmai, Tanner 
80526 Pentium III, Celeron, Pentium | Coppermine, Cascades 
III Xeon 
80528 Pentium 4, Xeon Willamette (Socket 423), Foster 
80529 canceled Timna 
80530 Pentium III, Celeron Tualatin 
80531 Pentium 4, Celeron Willamette (Socket 478) 
80532 Pentium 4, Celeron, Xeon Northwood, Prestonia, Gallatin 
80533 Pentium III Coppermine (cD0-step) 
80534 Pentium 4 SFF Northwood (small form factor) 
80535 Pentium M, Celeron M 310-340 | Banias 
80536 Pentium M, Celeron M 350-390 | Dothan 
80537 Core 2 Duo T5xxx, T7xxx, Merom 
Celeron M 5xx 
80538 Core Solo, Celeron M 4xx Yonah 
80539 Core Duo, Pentium Dual-Core | Yonah 
T-series 
80541 Itanium Merced 








414 


Appendix 






















































































Product Code Marketing Name (s) Code Name (s) 
80542 Itanium 2 McKinley 
80543 Itanium 2 Madison 
80546 Pentium 4, Celeron D, Xeon Prescott (Socket 478), Nocona, 
Irwindale, Cranford, Potomac 
80547 Pentium 4, Celeron D Prescott (LGA 775) 
80548 canceled Tejas and Jayhawk 
80549 Itanium 2 90xx Montecito 
80550 Dual-Core Xeon 71xx Tulsa 
80551 Pentium D, Pentium EE, Smithfield, Paxville DP 
Dual-Core Xeon 
80552 Pentium 4, Celeron D Cedar Mill 
80553 Pentium D, Pentium EE Presler 
80554 Celeron 800/900/1000 ULV Shelton 
80555 Dual-Core Xeon 50xx Dempsey 
80556 Dual-Core Xeon 51xx Woodcrest 
80557 Core 2 Duo E4xxx. E6xxx, Conroe 
Dual-Core Xeon 30xx, Pentium 
Dual-Core E2xxx 
80560 Dual-Core Xeon 70xx Paxville MP 
80562 Core 2 Quad, Core 2 Extreme Kentsfield 
QX6xxx, Quad-Core Xeon 32xx 
80563 Quad-Core Xeon 53xx Clovertown 
80564 Xeon 7200 Tigerton-DC 
80565 Xeon 7300 Tigerton 
80566 Atom Z5xx Silverthorne 
80567 Itanium 91xx Montvale 
80569 Core 2 Quad Q9xxx, Core 2 Yorkfield 
Extreme QX9xxx, Xeon 33xx 
80570 Core 2 Duo E8xxx, Xeon 31xx Wolfdale 
80571 Core 2 Duo E7xxx, Pentium Wolfdale-3M 
Dual-Core E5xxx, Pentium 
Dual-Core E2210 
80573 Xeon 5200 Wolfdale-DP 
80574 Core 2 Extreme QX9775, Xeon | Harpertown 
5400 
80576 Core 2 Duo P7xxx, T8xxx, Penryn 





P8xxx, T9xxx, P9xxx, SLOxxx, 
SP9xxx, Core 2 Extreme X9xxx 
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Product Code Marketing Name (s) Code Name (s) 
80577 Core 2 Duo P7xxx, P8xxx, Penryn-3M 
SU9xxx, T6xxx, T8xxx 
80578 LE80578 Vermilion Range 
80579 EP80579 Tolapai 
80580 Core 2 Quad Q8xxx, Q9xxx, Yorkfield-6M 
Xeon 33xx 
80581 Core 2 Quad Q9xxx Penryn-QC 
80582 Xeon 74xx Dunnington 
80583 Xeon 74xx Dunnington-QC 
80584 Xeon X33x3 LV Yorkfield CL 
80585 Core 2 Solo SU3xxx, Celeron Penryn-L 
7XX, 98X 
80586 Atom 2xx, N2xx Diamondville 
80587 Atom 3xx Diamondville DC 
Intel 806xx Processor Series 
Product Code Marketing Name (s) Code Name (s) 
80601 Core i7, Xeon 35xx Bloomfield 
80602 Xeon 55xx Gainestown 
80603 Itanium 93xx Tukwila 
80604 Xeon 65xx, Xeon 75xx Beckton 
80605 Core i5-7xx, Core i7-8xx, Xeon | Lynnfield 
34xx 
80606 Canceled Havendale 
80607 Core i7-7xx QM, Core i7-8xx Clarksfield 
QM, Core i7-9xx XM 
80608 Canceled Auburndale 
80609 Atom Z6xx Lincroft 
80610 Atom N400, D400, D500 Pineview 
80611 Canceled Larrabee 
80612 Xeon C35xx, Xeon C55xx Jasper Forest 
80613 Core i7-9xxX, Xeon 36xx Gulftown 
80614 Xeon 56xx Westmere-EP 
80615 Xeon E7-28xx, Xeon E7-48xx, ‘Westmere-EX 
Xeon E7-88xx 
80616 Pentium G6xxx, Core i3-5xx, Clarkdale 
Core i5-6xx 
80617 Core i5-5xx, Core Arrandale 
i7-6xxM/UM/LM 
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Product Code Marketing Name (s) Code Name (s) 

80618 Atom E6x0 Tunnel Creek 

80619 Core i7-3xxx Sandy Bridge-EP 

80620 Xeon E5-24xx Sandy Bridge-EP-8, Sandy 

Bridge-EP-4 

80621 Xeon E5-16xx, Xeon E5-26xx, Sandy Bridge-EP-8, Sandy 
Xeon E5-46xx Bridge-EP-4 

80622 Sandy Bridge-EP-8 

80623 Xeon E3-xxxx, Core Sandy Bridge-HE-4, Sandy 
i3/i5/i7-2xxx, Pentium Gxxx, Bridge-M-2 
Xeon E3-12xx 

80627 Core i3/i5/i7-2xxxM, Pentium | Sandy Bridge-HE-4, Sandy 
Bxxx, Celeron Bxxx Bridge-H-2, Sandy Bridge-M-2 

80631 Itanium 95xx Poulson 

80632 Atom E6x5C Stellarton 

80637 Core i5/i7-3xxx, Xeon-E3 Ivy Bridge 

80638 Mobile Core i5/i7-3xxxM Ivy Bridge 

80640 Atom Z24xx Penwell 

80641 Atom D2xxx, Atom N2xxx Cedarview 

80647 Haswell 

80649 Xeon Phi Knight's Corner 

80650 Atom Z27xx Cloverview 


Generations of Programming Language 


- First generation programming language is pure machine code that is 
just ones and zeros, e.g.0011001110000101001 


e Second-generation programming languages are a way of describing 
Assembly code which you may have already met. 


- Third-generation programming languages brought many 
programmer-friendly features to code such as loops, conditionals, classes etc. This 
means that one line of third generation code can produce many lines of object 
(machine) code, saving a lot of time when writing programs. 


G Fourth-generation languages are designed to reduce programming effort 
and the time it takes to develop software, resulting in a reduction in the cost of software 
development. (SQL), languages to make reports (Oracle Reports) and languages to 
construct user interface (XUL). 


